Startups

Invariant Labs
Invariant Labs makes AI agents secure, reliable and robust.

Publications

2026

Watch your steps: Dormant Adversarial Behaviors that Activate upon LLM Finetuning
Thibaud Gloaguen, Mark Vero, Robin Staab, Martin Vechev
ICLR 2026 Oral
Fewer Weights, More Problems: A Practical Attack on LLM Pruning
Kazuki Egashira, Robin Staab, Thibaud Gloaguen, Mark Vero, Martin Vechev
ICLR 2026

2025

MixAT: Combining Continuous and Discrete Adversarial Training for LLMs
Csaba Dékány*, Stefan Balauca*, Robin Staab, Dimitar I. Dimitrov, Martin Vechev
NeurIPS 2025 * Equal contribution
Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation
Giovanni De Muri, Mark Vero, Robin Staab, Martin Vechev
arXiv 2025
Black-Box Adversarial Attacks on LLM-Based Code Completion
Slobodan Jenko*, Niels Mündler*, Jingxuan He, Mark Vero, Martin Vechev
ICML 2025 * Equal contribution
Mind the Gap: A Practical Attack on GGUF Quantization
Kazuki Egashira, Robin Staab, Mark Vero, Jingxuan He, Martin Vechev
ICML 2025 BuildingTrust@ICLR25 Oral
Large Language Models are Advanced Anonymizers
Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev
ICLR 2025

2024

A Synthetic Dataset for Personal Attribute Inference
Hanna Yukhymenko, Robin Staab, Mark Vero, Martin Vechev
NeurIPS Datasets and Benchmarks 2024
Private Attribute Inference from Images with Vision-Language Models
Batuhan Tömekçe, Mark Vero, Robin Staab, Martin Vechev
NeurIPS 2024
Exploiting LLM Quantization
Kazuki Egashira, Mark Vero, Robin Staab, Jingxuan He, Martin Vechev
NeurIPS 2024 NextGenAISafety@ICML24 Oral
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev
ICLR 2024 Spotlight, 2024 PPPM-Award