Jasmine Li

Research Overview

My research broadly spans three areas:

Oversight and control. What oversight measures robustly scale to increasingly capable frontier language models? I currently focus on training-time mitigations for evaluation awareness and model introspection — understanding how and when models represent the distinction between evaluation and deployment, and how to intervene on this.
Value alignment and epistemics. How can we train models to be more honest? How can AI uplift human truth-seeking and moral progress, and how do we prevent harmful value lock-in?
Agent security. How do we design scalable, realistic environments for evaluating agent misuse and misbehavior — for example, in computer use and MCP settings?

As of March 2026, the research directions that excite me most are: understanding model introspection and situational awareness, exploring metacognition-based alignment techniques, alignment pretraining, and operationalizing AI-induced human disempowerment.

I want to be the most excellent researcher I can be. I enjoy rapid experimentation and careful truth-seeking. Above all else, I care about real-world impact and choosing the right, most pressing problems to work on.

In a past life, I researched analytical chemistry and published in Nature Communications and ACS journals.

Papers

EigenBench: A Comparative Behavioral Measure of Value Alignment

Chang, J., Piff, L., Sana, S., Li, J.X., Levine, L.

ICLR Oral (Top 5%), 2026

[arXiv]

ProgressGym: Alignment with a Millennium of Moral Progress

Qiu, T., Zhang, Y., Huang, Z., Li, J.X., et al.

NeurIPS Spotlight (Top 10%), 2024

[arXiv]

Scaling laws for contrastive activation addition with refusal mechanisms and Llama 2 models

Mentored by Abdur Raheem Ali

ICML Workshop, 2025

[paper]

Machining water through laser cutting of nanoparticle-encased water pancakes

Nature Communications, 2023

[paper]

A three-dimensional paper-based isoelectric focusing device for direct analysis of proteins in physiological samples

ANALYTICAL CHEMISTRY, 2021

[paper]

Organizing

Proxima
An anthology of speculative fiction and art, imagining post-intelligence explosion futures.
Idealists Collective Unconference
April 2026
HTML Day 2025, Seattle
an HTML freewrite embracing HTML energy, the slow web, and community
Saturday 8am Walk & Yaps
weekly saturday morning walks in seattle
Pen & Ponder
a 1 month writing experiment & toronto writeathon!

Code

Ritu

Smart pest and weather prediction for farmers. Grand Prize, Cornell Switch the Pitch Hackathon

ALIGN

Designed optimized database and search for Wex, Cornell Legal Information Institute's dictionary. First Prize, LII Hackathon

Circles

Frictionless friend meetups. Big Red Hacks 2024