Research Overview

My research broadly spans three areas:

  1. Oversight and control. What oversight measures robustly scale to increasingly capable frontier language models? I currently focus on training-time mitigations for evaluation awareness and model introspection — understanding how and when models represent the distinction between evaluation and deployment, and how to intervene on this.
  2. Value alignment and epistemics. How can we train models to be more honest? How can AI uplift human truth-seeking and moral progress, and how do we prevent harmful value lock-in?
  3. Agent security. How do we design scalable, realistic environments for evaluating agent misuse and misbehavior — for example, in computer use and MCP settings?

As of March 2026, the research directions that excite me most are: understanding model introspection and situational awareness, exploring metacognition-based alignment techniques, alignment pretraining, and operationalizing AI-induced human disempowerment.

I want to be the most excellent researcher I can be. I enjoy rapid experimentation and careful truth-seeking. Above all else, I care about real-world impact and choosing the right, most pressing problems to work on.

In a past life, I researched analytical chemistry and published in Nature Communications and ACS journals.

Papers

Organizing

Code

RituRitu

Smart pest and weather prediction for farmers. Grand Prize, Cornell Switch the Pitch Hackathon

ALIGNALIGN

Designed optimized database and search for Wex, Cornell Legal Information Institute's dictionary. First Prize, LII Hackathon

CirclesCircles

Frictionless friend meetups. Big Red Hacks 2024