Publications
2024
SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents
Niels Mündler, Mark Niklas Müller, Jingxuan He, Martin Vechev
NeurIPS
2024
Ward: Provable RAG Dataset Inference via LLM Watermarks
Nikola Jovanović, Robin Staab, Maximilian Baader, Martin Vechev
arXiv
2024
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Philipp Guldimann, Alexander Spiridonov, Robin Staab, Nikola Jovanović, Mark Vero, Velko Vechev, Anna Gueorguieva, Mislav Balunović, Nikola Konstantinov, Pavol Bielik, Petar Tsankov, Martin Vechev
arXiv
2024
Discovering Clues of Spoofed LM Watermarks
Thibaud Gloaguen, Nikola Jovanović, Robin Staab, Martin Vechev
arXiv
2024
A Unified Approach to Routing and Cascading for LLMs
Jasper Dekoninck, Maximilian Baader, Martin Vechev
ArXiv
2024
Practical Attacks against Black-box Code Completion Engines
Slobodan Jenko, Jingxuan He, Niels Mündler, Mark Vero, Martin Vechev
arXiv
2024
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
Jasper Dekoninck, Maximilian Baader, Martin Vechev
ArXiv
2024
Watermark Stealing in Large Language Models
Nikola Jovanović, Robin Staab, Martin Vechev
ICML
2024
R2-FM@ICLR24 Oral
Instruction Tuning for Secure Code Generation
Jingxuan He*, Mark Vero*, Gabriela Krasnopolska, Martin Vechev
ICML
2024
* Equal contribution
Prompt Sketching for Large Language Models
Luca Beurer-Kellner, Mark Niklas Müller, Marc Fischer, Martin Vechev
ICML
2024
Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation
Luca Beurer-Kellner, Marc Fischer, Martin Vechev
ICML
2024
A Synthetic Dataset for Personal Attribute Inference
Hanna Yukhymenko, Robin Staab, Mark Vero, Martin Vechev
NeurIPS Datasets and Benchmarks
2024
ConStat: Performance-Based Contamination Detection in Large Language Models
Jasper Dekoninck, Mark Niklas Müller, Martin Vechev
NeurIPS
2024
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev
ICLR
2024
Spotlight, 2024 PPPM-Award
Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation
Niels Mündler, Jingxuan He, Slobodan Jenko, Martin Vechev
ICLR
2024
Black-Box Detection of Language Model Watermarks
Thibaud Gloaguen, Nikola Jovanović, Robin Staab, Martin Vechev
arXiv
2024
Exploiting LLM Quantization
Kazuki Egashira, Mark Vero, Robin Staab, Jingxuan He, Martin Vechev
NeurIPS
2024
NextGenAISafety@ICML24 Oral
Controlled Text Generation via Language Model Arithmetic
Jasper Dekoninck, Marc Fischer, Luca Beurer-Kellner, Martin Vechev
ICLR
2024
Spotlight
Large Language Models are Advanced Anonymizers
Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev
arXiv
2024
Evading Data Contamination Detection for Language Models is (too) Easy
Jasper Dekoninck, Mark Niklas Müller, Maximilian Baader, Marc Fischer, Martin Vechev
arXiv
2024
2023
Large Language Models for Code: Security Hardening and Adversarial Testing
Jingxuan He, Martin Vechev
ACM CCS
2023
Distinguished Paper Award
LMQL Chat: Scripted Chatbot Development
Luca Beurer-Kellner*, Marc Fischer*, Martin Vechev
Neural Conversational AI Workshop, TEACH -- ICML
2023
* Equal contribution
Large Language Models are Zero-Shot Multi-Tool Users
Luca Beurer-Kellner*, Marc Fischer*, Martin Vechev
Knowlege and Logical Reasoning Workshop -- ICML
2023
* Equal contribution
Prompting Is Programming: A Query Language for Large Language Models
Luca Beurer-Kellner, Marc Fischer, Martin Vechev
PLDI
2023