🧐 About Me
Hi there! I am a 2-year PhD student in Computer Science at the ETH Zurich, under the supervision of Prof. Florian Tramèr, and a member of the Secure and Private AI (SPY) Lab. I completed my master’s degree in Software Engineering at Zhejiang University in March 2023, advised by Prof. Chao Wu. Before that, I received my Bachelor’s degree at Hainan University in July 2020.
Research Interests:
🤔 For my PhD study, my primary focus is on examining the potential security and privacy risks in ML systems, both in their current state and as they evolve in the future. My research aims to uncover vulnerabilities and develop strategies to mitigate these risks, ultimately contributing to the development of more secure and privacy-preserving machine learning technologies.
🔥 News
- 2024.09: 🎉 AgentDojo is accepted by NeurIPS 2024 (dataset and benchmark track). Benchmark.
- 2024.07: 🎉 Evaluations of Machine Learning Privacy Defenses are Misleading is accepted by CCS 2024. Blogpost.
- 2024.01: Real-Fake is accepted by ICLR 2024.
- 2023.03: 🎉 I graduate from ZJU.
📒 Blogs
(Our lab has very nice 📚 Blogs about AI security and privacy, highly recommended for reading!)
- 😅 How to INCORRECTLY detect pretraining data in GPT? (Coming soon)
📝 Selected Publications
( * indicates equal contribution. Full list of publications)
Preprint.
Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Jie Zhang, Debeshee Das, Gautam Kamath, Florian Tramèr.
- We argue that MIA is fundamentally flawed for proving training data use. To provide convincing evidence, the data creator must show the attack has a low false positive rate, meaning its output is unlikely under the null hypothesis (i.e., the model wasn’t trained on the target data). However, it’s impossible to sample from this null hypothesis because we don’t know the exact training set and can’t efficiently retrain large models. We propose two solutions: using data extraction attacks or membership inference on specially crafted canary data for reliable training data proofs.
Blind Baselines Beat Membership Inference Attacks for Foundation Models
Debeshee Das, Jie Zhang, Florian Tramèr. [code]
- Unfortunately, we find that evaluations of MI attacks for foundation models are flawed, because they sample members and non-members from different distributions. For 8 published MI evaluation datasets, we show that blind attacks—that distinguish the member and non-member distributions without looking at any trained model—outperform state-of-the-art MI attacks. Existing evaluations thus tell us nothing about membership leakage of a foundation model’s training data.
Accepted.
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Edoardo Debenedetti, Jie Zhang, Mislav Balunović, Luca Beurer-Kellner, Marc Fischer, Florian Tramèr. [code]
- To measure the adversarial robustness of AI agents, we introduce AgentDojo, an evaluation framework for agents that execute tools over untrusted data. AgentDojo is an extensible environment for designing and evaluating new agent tasks, defenses, and adaptive attacks. We populate the environment with 97 realistic tasks, 629 security test cases, and various attack and defense paradigms from the literature. AgentDojo poses a challenge for both attacks and defenses: state-of-the-art LLMs fail at many tasks (even in the absence of attacks), and existing prompt injection attacks break some security properties but not all.
Evaluations of Machine Learning Privacy Defenses are Misleading
Michael Aerni*, Jie Zhang*, Florian Tramèr. [code] [blogpost]
- Empirical defenses for private machine learning forgo the provable guarantees of differential privacy in the hope of achieving high utility on real-world data. We find that evaluations of such methods can be highly misleading. In this work, we thus propose a new evaluation protocol that is reliable and efficient.
🎖 Honors and Awards
- 2021.05 We won the first prize on CVPR21 Workshop (Adversarial Machine Learning in Real-World Computer Vision Systems and Online Challenges, rank: 1 / 1558).
- 2022.10 China National Scholarship, Zhejiang University, 2022
- Outstanding Student Scholarship, First Prize, Hainan University, 2018, 2019, 2020.
📖 Educations
- 🎓 2020.09 - 2023.03, Master, Zhejiang University, China.
- 🎓 2016.09 - 2020.06, Undergraduate, Hainan University, China.
💬 Services
- Journal Reviewer:
- IEEE Transactions on Neural Networks and Learning Systems
- Neural Networks
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- Conference Reviewer: ICLR, AAAI, CVPR, ICML, ECCV, ICCV, NeurIPS.
💻 Internships
- 2021.11 - 2022.06, Sony AI, Research Intern, Tokyo.
- 2020.10 - 2021.10, Tencent, Youtu Lab, Research Intern, Shanghai.
- 2019.11 - 2020.4, Alibaba, AliExpress, Software Engineer, Hangzhou.
🎙 Miscellaneous
Travel
I enjoy the time traveling with my families and friends. I am always excited about visiting new places and knowing different cultures.
My cat
My girlfriend and I have three cats together, they are very adorable and have brought a lot of fun to our lives!