Jie Zhang

Jie Zhang

📝 Selected Publications

( * indicates equal contribution. Full list of publications)

📚 Preprint

TBD

🚀 Something is Coming Soon™ (Probably) Status: Thinking hard 🤔 …]

preprint

sym

Laundering AI Authority with Adversarial Examples

Jie Zhang, Pura Peetathawatchai, Florian Tramèr, Avital Shafran

[ICML 2026 workshop , Agents in the Wild]

preprint

sym

Learning to Inject: Automated Prompt Injection via Reinforcement Learning

Xin Chen, Jie Zhang, Florian Tramèr

[ICML 2026 workshop , Agents in the Wild]

preprint

sym

Black-box Optimization of LLM Outputs by Asking for Directions

Jie Zhang, Meng Ding, Yang Liu, Jue Hong, Florian Tramèr

[ICLR Trustworthy AI workshop 2026, Spotlight Talk]

✅ Accepted

ICML 2026

Position: Adversarial ML for LLMs Is Not Making Any Progress

Javier Rando*, Jie Zhang*, Nicholas Carlini, Florian Tramèr

[ICML 2026]

NeurIPS 2025

sym

RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics

Jie Zhang, Cezara Petrui, Kristina Nikolić, Florian Tramèr

[NeurIPS 2025, Dataset $\&$ Benchmark Track]

ICML 2025

sym

The Jailbreak Tax: How Useful are Your Jailbreak Outputs?

Kristina Nikolić, Luze Sun, Jie Zhang, Florian Tramèr

[ICML 2025, Spotlight]

IEEE SP 2025, DLSP workshop

Membership Inference Attacks on Sequence Models

Lorenzo Rossi, Michael Aerni, Jie Zhang, Florian Tramèr

[IEEE SP 2025, DLSP workshop, Best Paper Award]

SaTML 2025

sym

Position: Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

Jie Zhang, Debeshee Das, Gautam Kamath, Florian Tramèr

[IEEE SaTML 2025]

CCS 2024

sym

Evaluations of Machine Learning Privacy Defenses are Misleading

Michael Aerni*, Jie Zhang*, Florian Tramèr

code blog poster

[ACM CCS 2024]

ICLR 2025

sym

Does Training with Synthetic Data Truly Protect Privacy?

Yunpeng Zhao, Jie Zhang

[ICLR 2025]

IEEE SP 2025, DLSP workshop

sym

Blind Baselines Beat Membership Inference Attacks for Foundation Models

Debeshee Das, Jie Zhang, Florian Tramèr

[IEEE SP 2025, DLSP workshop]

NeurIPS 2024

sym

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

Edoardo Debenedetti, Jie Zhang, Mislav Balunović, Luca Beurer-Kellner, Marc Fischer, Florian Tramèr

[NeurIPS 2024 Dataset $\&$ Benchmark Track]

ICLR 2024

sym

Real-Fake: Effective Training Data Synthesis Through Distribution Matching

Jianhao Yuan, Jie Zhang, Shuyang Sun, Philip Torr, Bo Zhao

[ICLR 2024]