Milena Nowaczewska

Hi there. I'm based in San Francisco, where I work in tech policy at the EU Delegation to the United States.

I'm interested in how advanced AI systems behave in real-world contexts, especially as they become more agentic, less predictable, and harder to meaningfully evaluate.

I'm trying to understand how to evaluate and reason about systems whose behavior is not fully specified in advance, and what this implies for policymaking as institutional capacity lags behind technical progress - particularly when failures emerge not from misuse, but from how systems interpret underspecified goals in practice.

This extends to questions of how to evaluate agent behavior in deployment, where failures often arise not from adversarial use, but from how systems act on underspecified objectives.

Would be especially curious to learn from people working on evaluation and agent behavior in practice. milena.nowaczewska3@gmail.com

Milena Nowaczewska

Articles

Contact