Funded by ERC grant BIGCODE - #680358
Startups
Statistical Engines
JSNice
JSNice de-obfuscates JavaScript programs. JSNice is a popular system in the JavaScript commmunity used by tens of thousands of programmers, worldwide
Nice2Predict
Efficient and scalable open-source framework for structured prediction, enabling one to build new statistical engines more quickly.
DeGuard
Based on Nice2Predict, DeGuard reverses the process of layout obfuscation done by Android obfuscation systems. It enables security analyses, including code inspection and predicting libraries.
Datasets and Models
150k Python Dataset
Dataset consisting of 150'000 Python ASTs
150k JavaScript Dataset
Dataset consisting of 150'000 JavaScript files and their parsed ASTs
Probablistic models
Sythesized programs for probabilistic models (on the above datasets)
JSNice artifact
JSNice artifact that contains an engine, trained model and evaluation dataset
JSNice dataset
List of GitHub repositories used to train JSNice on
Publications
2023
Large Language Models for Code: Security Hardening and Adversarial Testing
Jingxuan He, Martin Vechev
ACM CCS
2023
Distinguished Paper Award
2022
On Distribution Shift in Learning-based Bug Detectors
Jingxuan He, Luca Beurer-Kellner, Martin Vechev
ICML
2022
2021
Learning to Explore Paths for Symbolic Execution
Jingxuan He, Gishor Sivanrupan, Petar Tsankov, Martin Vechev
ACM CCS
2021
TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer
Berkay Berabi, Jingxuan He, Veselin Raychev, Martin Vechev
ICML
2021
Learning to Find Naming Issues with Big Code and Small Supervision
Jingxuan He, Cheng-Chun Lee, Veselin Raychev, Martin Vechev
PLDI
2021
Robustness Certification with Generative Models
Matthew Mirman, Alexander Hägele, Timon Gehr, Pavol Bielik, Martin Vechev
PLDI
2021
2020
Learning Fast and Precise Numerical Analysis
Jingxuan He, Gagandeep Singh, Markus Püschel, Martin Vechev
PLDI
2020
Guiding Program Synthesis by Learning to Generate Examples
Larissa Laich, Pavol Bielik, Martin Vechev
ICLR
2020
2019
Learning to Infer User Interface Attributes from Images
Philippe Schlattner, Pavol Bielik, Martin Vechev
ArXiv
2019
Learning to Fuzz from Symbolic Execution with Application to Smart Contracts
Jingxuan He, Mislav Balunović, Nodar Ambroladze, Petar Tsankov, Martin Vechev
ACM CCS
2019
Unsupervised Learning of API Aliasing Specifications
Jan Eberhardt, Samuel Steffen, Veselin Raychev, Martin Vechev
PLDI
2019
Scalable Taint Specification Inference with Big Code
Victor Chibotaru, Benjamin Bichsel, Veselin Raychev, Martin Vechev
PLDI
2019
2018
Robust Relational Layouts Synthesis from Examples for Android
Pavol Bielik, Marc Fischer, Martin Vechev
ACM OOPSLA
2018
DEBIN: Predicting Debug Information in Stripped Binaries
Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, Martin Vechev
ACM CCS
2018
Inferring Crypto API Rules from Code Changes
Rumen Paletov, Petar Tsankov, Veselin Raychev, Martin Vechev
PLDI
2018
2017
Program Synthesis for Character Level Language Modeling
Pavol Bielik, Veselin Raychev, Martin Vechev
ICLR
2017
2016
Probabilistic Model for Code with Decision Trees
Veselin Raychev, Pavol Bielik, Martin Vechev
ACM OOPSLA
2016
Statistical Deobfuscation of Android Applications
Benjamin Bichsel, Veselin Raychev, Peter Tsankov, Martin Vechev
ACM CCS
2016
2015
Predicting Program Properties from "Big Code"
Veselin Raychev, Martin Vechev, Andreas Krause
ACM POPL
2015
Programming with Big Code: Lessons, Techniques and Applications
Pavol Bielik, Veselin Raychev, Martin Vechev
SNAPL
2015
2014
Code Completion with Statistical Language Models
Veselin Raychev, Martin Vechev, Eran Yahav
ACM PLDI
2014
Phrase-Based Statistical Translation of Programming Languages
Svetoslav Karaivanov, Veselin Raychev, Martin Vechev
Onward
2014
Talks
Learning to Analyze Programs at Scale
Machine Learning for Programming Workshop, FLOC 2018
Learning a static analyzer from data
Computer Aided Verification 2017
Probabilistic and Interpretable Models for Code
SYNT workshop, FLOC 2018
Machine Learning for Programming
iFM 2017 Keynote Talk
DeGuard: Statistical Deobfuscation for Android
Android Security Symposium 2017
Programming Languages and Machine Learning
Neural Abstract Machines & Program Induction (NIPS'16 workshop)
Statistical Deobfuscation of Android Applications
CCS 2016 talk
Machine Learning for Programs
CAV'16 Tutorial
Probabilistic Learning from Big Code
ISSTA'16 Keynote Talk
PHOG: Probabilistic Model for Code
ICML 2016 talk
Learning Programs from Noisy Data
POPL 2016 talk
Machine Learning for Programming
Invited Talk at ML4PL'15
Machine Learning for Code Analytics
PLDI'15 Tutorial
Machine Learning for Programming
Invited Talk at MIT ExCAPE'15 Summer School
Machine Learning for Programming
Invited Talk at TCE'15 Conference
Programming with Probabilistic Graphical Models
EPFL Colloquium, Dec, 2014
Programming Tools based on Big Data and Conditional Random Fields
Zurich Machine Learning and Data Science Meet-up
Statistical Program Analysis and Synthesis
HVC'14 Keynote
Statistical Program Analysis and Synthesis
ETH Workshop 2014
Code Completion with Statistical Language Models
Talk given at University of Washington and Microsoft Research (by V. Raychev) and EPFL and ETH (by Martin Vechev)
Resources
- A new web site for learning from Big Code has been released here: HERE. The web site contains data sets, systems and challenge problems from groups working in the area.
- We are co-organizing a Dagstuhl Seminar on "Programming with Big Code", Nov 15-18, 2015