We regularly offer bachelor and masterthesis projects for motivated students who are excited to explore research questions in our areas of interest. Topics vary depending on ongoing projects, but we are always open to new ideas and collaborative exploration. Below is a selection of current or recent thesis topics to give you a sense of what working with us might look like.
Winter Semester 2025
Key References:
Key References:
Programming by Example (PBE) aims to learn programs that match input–output examples. While many benchmarks report accuracy-based metrics, such measures often fail to capture generalization, compositionality, or partial correctness. This thesis develops and evaluates alternative metrics for assessing PBE models, such as semantic similarity and structural complexity. Using established PBE datasets (e.g., DeepCoder, RobustFill), the study will compare several state-of-the-art models under the proposed metrics to provide a more nuanced understanding of their reasoning ability and robustness.
Key References:
Physical Guards is a research-driven smart home project that aggregates sensor data to detect anomalies and potential security threats within residential environments. The current system processes data locally and visualizes alerts through a basic GUI. However, to support broader usability and improve situational awareness, the system requires both a more advanced interface and additional functionalities.
This thesis focuses on improving the existing GUI by making it more intuitive, informative, and responsive to real-time data. Moreover, it aims to transition the current local setup into a secure web application, enabling remote access and monitoring. Students will work on integrating sensor data streams, implementing real-time alert mechanisms, and enhancing the overall user experience. Additional features such as event history, customizable alert settings, and multi-user support can be explored based on interest and scope.
The rapid evolution of software development tools has enabled the automatic generation of user interfaces from low- to high-fidelity prototypes. However, these approaches typically result in front-end prototypes with limited or no backend functionality, restricting their practical applicability. Recent advancements in Large Language Models (LLMs) offer new possibilities for generating not only front-end interfaces but also complete backend systems directly from natural language requirements.
This bachelor thesis explores and evaluates the effectiveness of current LLMs in generating full-stack applications, including both the backend logic and its integration with the generated front end. Given a collection of requirements described in natural language, the approach aims to automatically produce a functional application that satisfies the specified requirements. The evaluation should be conducted using multiple requirements datasets, assessing the generated applications in terms of correctness, completeness, and integration quality. Furthermore, the thesis systematically investigates and categorizes the typical errors made by LLMs during the generation process. This work provides insights into both the capabilities and limitations of LLM-driven full-stack generation, contributing to the advancement of automated software engineering.
BugPlus is a minimalist, Turing-complete programming language loosely inspired by marble-logic games like Turing Tumble. A bug is a stateful element that can be connected to other bugs and perform simple additions. When an input signal fires, the bug performs a math operation and forwards the result to the next connected bug. Entire programs are directed graphs of bugbits: the first bugbit receives the only external input; all subsequent activity is determined by wiring and internal states.
Because of its simplicity yet non-trivial behaviour, BugPlus is a compelling playground for teaching digital-logic concepts, exploring program synthesis, and potentially benchmarking reasoning in Large Language Models. Currently, however, no comprehensive tooling exists for humans to lay out, simulate, and debug BugPlus programs.
This thesis aims to design, implement, and evaluate an integrated desktop or web application—BugPlus Studio—that enables users to:
P.R.O.G.R.E.S.S. is intended as an Isaac Sim extension for procedurally generating a wide variety of wheeled mobile robots. In this thesis, you will focus on developing the GUI-based extension, with an emphasis on interactive construction and configuration of mobile robots. The core objective is to support the generation of diverse sets of agents to facilitate the training of deep learning models that can generalize across varying robot configurations and control dynamics.
The envisioned system allows users to design SIMPLE mobile robot models by combining basic geometric shapes (e.g., boxes, cylinders) to define the body structure, and to optionally add joints where articulation is desired—such as for arms, grippers, or sensor mounts.
It is expected to support detailed configuration of wheel parameters, including the number of wheels, their placement on the chassis, whether each wheel is steerable or fixed, and relevant control properties (e.g., maximum steering angle).
The system should automatically generate all required files (e.g., USD and URDF), making the robots compatible with Isaac Sim and exportable to other simulators.
In this project, the student will first train or fine-tune small Transformer models on a carefully designed suite of synthetic tasks that reveal distinct model behaviours. Using Anthropic’s newly open-sourced Circuit Tracer framework, they will trace activation pathways from input tokens through attention heads and MLP neurons all the way to the output logits, obtaining ranked causal paths that explain each behaviour. The student will fuse these path scores into a single “Specificity × Influence” measureand then implement a greedy pruning routine that collapses the network into a minimal, symbolically describable circuit. To validate their results, they will compare the recovered mechanisms against the ground-truth algorithmic features embedded in the tasks and against insights reported in prior manual studies, quantifying fidelity, compactness and compute efficiency. The final deliverables will include an open-source notebook suite that wraps Circuit Tracer for small models and a systematic study of how circuit size and interpretability scale with model width, depth and task difficulty, demonstrating a fully automated approach to Transformer interpretability audits.
Winter Semester 2025
Large Language Models excel at open-ended text generation but often struggle with syntactic validity and compositional reasoning in program synthesis. This thesis explores how soft constraints, i.e., gentle guidance through softly restricting the available output tokes, can align LLM outputs with a Domain-Specific Language (DSL) without fully restricting their generative flexibility. The work will evaluate the trade-offs between expressiveness, syntactic correctness, and functional accuracy across different constraint strengths. Results will inform how softly enforcing LLMs on DSLs can improve performance while preserving generative capabilities of LLMs in neural program generation.
Key References:
The Explain Reality project aims to bridge the gap between state-of-the-art AI and immersive augmented reality by deploying foundation models such as segmentation and object detection directly on the Meta Quest 3. A first prototype has demonstrated the feasibility of running these models on-device and recognizing objects and regions in the user’s environment.
This thesis takes the next step: making the system truly interactive. The goal is to enable users not just to view visual detections, but to actively engage with their surroundings—asking for explanations about specific objects, receiving contextual information, and navigating their environment through natural interaction.
The work includes enhancing the integration between vision models and user input (e.g., gaze, hand tracking, voice), refining the interface for real-time feedback, and implementing intuitive explanation flows. This thesis is ideal for students excited about building real-world applications at the intersection of machine learning, XR (extended reality), and human-AI interaction.
You will work hands-on with the Meta Quest 3, explore the deployment of machine learning models in immersive environments, and develop APIs and interaction modules that bring intelligent systems closer to daily use.
MotherNet is a recent hypernetwork-based approach designed to generate entire neural networks in a single forward pass, enabling efficient in-context learning on new tabular datasets without task-specific gradient descent. While this approach has shown promising results in terms of speed and predictive accuracy, its reliance on standard neural architectures limits interpretability and structural flexibility.
This thesis explores an extension of the MotherNet framework to alternative model representations — such as recursive programs, symbolic rules, or decision trees — with the goal of enabling fast, gradient-free generation of interpretable models. We investigate how to reformulate the decoder of the hypernetwork to emit structured programs or tree-based models, and evaluate the trade-offs in performance, interpretability, and generalization.
This work aims to bridge the gap between hypernetwork-driven meta-learning and the demand for transparent, human-readable model outputs in real-world tabular domains.
The increasing complexity of software systems necessitates robust and efficient methods for requirements verification to ensure system reliability and compliance. Recent advancements in Large Language Models (LLMs) have demonstrated significant potential in automating various software engineering tasks. This thesis investigates the design, implementation, and evaluation of LLM-based Graphical User Interface (GUI) agents for the automated verification of software requirements. Given requirements in natural language (e.g., a collection of user stories) and an implemented interactive GUI-based system, the approach should validate the correct implementation of the requirements. Through a series of experiments on multiple requirements datasets, the proposed approach should be assessed in terms of effectiveness and efficiency. This work should contribute to the field by providing a novel framework and practical insights for adopting LLM-driven agents in requirements engineering processes.
Large Language Models (LLMs) are increasingly deployed as autonomous agents capable of reasoning, decision-making across multiple steps. However, the internal mechanisms and the extent to which LLMs can perform agent-like behavior—such as maintaining goals, updating beliefs, or planning — remain poorly understood.
In this you will take a closer look at LLMs and mechanistically analyze their decision-making capabilities.
Example Paper: https://arxiv.org/abs/2210.13382
Autoformalization aims to translate informal mathematical text into formal proofs, a key challenge in AI-assisted theorem proving. This thesis investigates how existing library learning techniques, e.g. [1,3], impact the effectiveness and efficiency of autoformalization when integrated with Large Language Models and program synthesis. The goal is to evaluate their ability to reuse previously generated lemmas and improve proof success rates. The study will use benchmark datasets like miniF2F and MATH and compare against baselines without library learning. Results will inform the practical value and limitations of library learning [2] in formal reasoning systems.
Key references:
Protein diffusion models can generate high-quality protein structures, but controlling them to produce specific functional features remains challenging. This thesis proposes to bridge interpretability research with protein design, asking: can we understand how these models work well enough to control them? The central question is whether interpretability techniques can help us understand and control the generation of functional motifs in protein diffusion models. Rather than treating these models as black boxes, this work will explore how to probe their internal representations and guide generation toward desired outcomes. The research will adapt interpretability methods like causal tracing to work with continuous diffusion processes, investigating whether model internals contain interpretable directions corresponding to functional features like binding sites or catalytic regions. If such directions exist, we can explore methods for steering generation along these axes while preserving structural integrity. The scope is intentionally broad, as this intersection remains largely unexplored. Specific techniques will be refined as we understand what's feasible and most insightful. This work aims to establish new methodologies for interpretable protein design, including tools for probing diffusion model internals and techniques for controlled generation of functional motifs. More broadly, it could demonstrate how interpretability research can move beyond understanding models to improving their practical utility.