Posts

4 Jan 2026

QuestionBench: Measuring How Well AI Agents Ask Questions

We built a benchmark to test whether AI models can strategically ask questions to gather information they don’t know. Claude Opus 4.5 with extended thinking achieved 78.30% task success.

6 Feb 2025

AI, AGI, and the Future: Insights from the Last Two Weeks

I’m going to share some incredible insights I’ve come across in the past two weeks.