Research Overview
My research broadly spans three areas:
- Oversight and control. What oversight measures robustly scale to increasingly capable frontier language models? I currently focus on training-time mitigations for evaluation awareness and model introspection — understanding how and when models represent the distinction between evaluation and deployment, and how to intervene on this.
- Value alignment and epistemics. How can we train models to be more honest? How can AI uplift human truth-seeking and moral progress, and how do we prevent harmful value lock-in?
- Agent security. How do we design scalable, realistic environments for evaluating agent misuse and misbehavior — for example, in computer use and MCP settings?
As of March 2026, the research directions that excite me most are: understanding model introspection and situational awareness, exploring metacognition-based alignment techniques, alignment pretraining, and operationalizing AI-induced human disempowerment.
I want to be the most excellent researcher I can be. I enjoy rapid experimentation and careful truth-seeking. Above all else, I care about real-world impact and choosing the right, most pressing problems to work on.
In a past life, I researched analytical chemistry and published in Nature Communications and ACS journals.
Papers
Chang, J., Piff, L., Sana, S., Li, J.X., Levine, L.
ICLR Oral (Top 5%), 2026
[arXiv]
Qiu, T., Zhang, Y., Huang, Z., Li, J.X., et al.
NeurIPS Spotlight (Top 10%), 2024
[arXiv]
Mentored by Abdur Raheem Ali
ICML Workshop, 2025
[paper]
Nature Communications, 2023
[paper]
Organizing
- Proxima
An anthology of speculative fiction and art, imagining post-intelligence explosion futures.
- Idealists Collective Unconference
April 2026
- HTML Day 2025, Seattle
an HTML freewrite embracing HTML energy, the slow web, and community
- Saturday 8am Walk & Yaps
weekly saturday morning walks in seattle
- Pen & Ponder
a 1 month writing experiment & toronto writeathon!


