Publications

2024

SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents
Niels Mündler, Mark Niklas Müller, Jingxuan He, Martin Vechev
NeurIPS 2024
Ward: Provable RAG Dataset Inference via LLM Watermarks
Nikola Jovanović, Robin Staab, Maximilian Baader, Martin Vechev
arXiv 2024
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Philipp Guldimann, Alexander Spiridonov, Robin Staab, Nikola Jovanović, Mark Vero, Velko Vechev, Anna Gueorguieva, Mislav Balunović, Nikola Konstantinov, Pavol Bielik, Petar Tsankov, Martin Vechev
arXiv 2024
Discovering Clues of Spoofed LM Watermarks
Thibaud Gloaguen, Nikola Jovanović, Robin Staab, Martin Vechev
arXiv 2024
A Unified Approach to Routing and Cascading for LLMs
Jasper Dekoninck, Maximilian Baader, Martin Vechev
ArXiv 2024
Practical Attacks against Black-box Code Completion Engines
Slobodan Jenko, Jingxuan He, Niels Mündler, Mark Vero, Martin Vechev
arXiv 2024
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
Jasper Dekoninck, Maximilian Baader, Martin Vechev
ArXiv 2024
Watermark Stealing in Large Language Models
Nikola Jovanović, Robin Staab, Martin Vechev
ICML 2024 R2-FM@ICLR24 Oral
Instruction Tuning for Secure Code Generation
Jingxuan He*, Mark Vero*, Gabriela Krasnopolska, Martin Vechev
ICML 2024 * Equal contribution
Prompt Sketching for Large Language Models
Luca Beurer-Kellner, Mark Niklas Müller, Marc Fischer, Martin Vechev
ICML 2024
Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation
Luca Beurer-Kellner, Marc Fischer, Martin Vechev
ICML 2024
A Synthetic Dataset for Personal Attribute Inference
Hanna Yukhymenko, Robin Staab, Mark Vero, Martin Vechev
NeurIPS Datasets and Benchmarks 2024
ConStat: Performance-Based Contamination Detection in Large Language Models
Jasper Dekoninck, Mark Niklas Müller, Martin Vechev
NeurIPS 2024
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev
ICLR 2024 Spotlight, 2024 PPPM-Award
Black-Box Detection of Language Model Watermarks
Thibaud Gloaguen, Nikola Jovanović, Robin Staab, Martin Vechev
arXiv 2024
Exploiting LLM Quantization
Kazuki Egashira, Mark Vero, Robin Staab, Jingxuan He, Martin Vechev
NeurIPS 2024 NextGenAISafety@ICML24 Oral
Controlled Text Generation via Language Model Arithmetic
Jasper Dekoninck, Marc Fischer, Luca Beurer-Kellner, Martin Vechev
ICLR 2024 Spotlight
Large Language Models are Advanced Anonymizers
Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev
arXiv 2024
Evading Data Contamination Detection for Language Models is (too) Easy
Jasper Dekoninck, Mark Niklas Müller, Maximilian Baader, Marc Fischer, Martin Vechev
arXiv 2024

2023

Large Language Models for Code: Security Hardening and Adversarial Testing
Jingxuan He, Martin Vechev
ACM CCS 2023 Distinguished Paper Award
LMQL Chat: Scripted Chatbot Development
Luca Beurer-Kellner*, Marc Fischer*, Martin Vechev
Neural Conversational AI Workshop, TEACH -- ICML 2023 * Equal contribution
Large Language Models are Zero-Shot Multi-Tool Users
Luca Beurer-Kellner*, Marc Fischer*, Martin Vechev
Knowlege and Logical Reasoning Workshop -- ICML 2023 * Equal contribution
Prompting Is Programming: A Query Language for Large Language Models
Luca Beurer-Kellner, Marc Fischer, Martin Vechev
PLDI 2023