Startups
Publications
2026
Watch your steps: Dormant Adversarial Behaviors that Activate upon LLM Finetuning
Thibaud Gloaguen, Mark Vero, Robin Staab, Martin Vechev
ICLR
2026
Oral
Oral
Fewer Weights, More Problems: A Practical Attack on LLM Pruning
Kazuki Egashira, Robin Staab, Thibaud Gloaguen, Mark Vero, Martin Vechev
ICLR
2026
2025
MixAT: Combining Continuous and Discrete Adversarial Training for LLMs
Csaba Dékány*, Stefan Balauca*, Robin Staab, Dimitar I. Dimitrov, Martin Vechev
NeurIPS
2025
* Equal contribution
Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation
Giovanni De Muri, Mark Vero, Robin Staab, Martin Vechev
arXiv
2025
Black-Box Adversarial Attacks on LLM-Based Code Completion
Slobodan Jenko*, Niels Mündler*, Jingxuan He, Mark Vero, Martin Vechev
ICML
2025
* Equal contribution
Mind the Gap: A Practical Attack on GGUF Quantization
Kazuki Egashira, Robin Staab, Mark Vero, Jingxuan He, Martin Vechev
ICML
2025
BuildingTrust@ICLR25 Oral
BuildingTrust@ICLR25 Oral
Large Language Models are Advanced Anonymizers
Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev
ICLR
2025
2024
A Synthetic Dataset for Personal Attribute Inference
Hanna Yukhymenko, Robin Staab, Mark Vero, Martin Vechev
NeurIPS Datasets and Benchmarks
2024
Private Attribute Inference from Images with Vision-Language Models
Batuhan Tömekçe, Mark Vero, Robin Staab, Martin Vechev
NeurIPS
2024
Exploiting LLM Quantization
Kazuki Egashira, Mark Vero, Robin Staab, Jingxuan He, Martin Vechev
NeurIPS
2024
NextGenAISafety@ICML24 Oral
NextGenAISafety@ICML24 Oral
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev
ICLR
2024
Spotlight, 2024 PPPM-Award
Spotlight, 2024 PPPM-Award
