Eugene Yan

Evals for Long-Context Question & Answer Systems

Hey friends, I've been thinking a lot about evals for long-context Q&A since building aireadingclub.com and wrote an introduction here. It covers (i) key evaluation metrics, (ii) how to generate questions for Q&A evaluation datasets, (iii) how to build LLM-evaluators to assess Q&A performance, and (iv) a review of several existing benchmarks. I hope you find it useful. P.S. If you want to learn more about evals, my friends Shreya and Hamel are hosting their final cohort of “AI Evals for...

19 days ago • 20 min read

Exceptional Leadership: Some Qualities, Behaviors, and Styles

Hey friends, I've been thinking a lot about leadership lately—what is it that makes some leaders so good that teams want to follow their lead? After some reflection, I came to three key points on leadership qualities, behaviors, and styles. Enjoy! I appreciate you receiving this, but if you want to stop, simply unsubscribe. • • • 👉 Read in browser for best experience (web version has extras & images) 👈 What makes an exceptional leader? Vision: They can foresee not only what will change, but...

about 2 months ago • 3 min read

Building a Multi-Agent Framework to Summarize News with MCP & Q

Hey friends, To better understand MCPs and agentic workflows, I built a news agent to help me generate a daily news summary. It’s built on Amazon Q CLI and MCP. The former provides the agentic framework and the latter provides news feeds via tools. It also uses tmux to spawn and display each sub-agent’s work. P.S. If you’re interested in topics like this, my friends Ben and Swyx are organizing the AI Engineer World’s Fair in San Francisco on 3rd - 5th June. Come talk to builders sharing their...

2 months ago • 2 min read

Stop Blaming the LLM-as-Judge—Fix Your Process Instead

Hey friends, I've seen many teams misunderstand what it means to build and apply product evals and wrote this piece to address it. I hope it clarifies that evals aren't a one and done artifact, but a disciplined process. Do you agree or disagree? Please reply and let me know! P.S., In May, my friends Hamel Husain and Shreya Shankar are teaching an exclusive 4-week course on "AI Evals for Engineers & PMs". They've generously provided a special 40% discount link 🤫—but hurry, limited spots...

3 months ago • 3 min read

Frequently Asked Questions about My Writing Process

Hey friends, Every month or so, I receive questions about my writing: “How did you get started?” “Why do you write?” “Who do you write for?” “What’s your writing process?” I’ve procrastinated on writing this FAQ because, honestly, who cares about my writing process? But after answering the same questions again and again, I realized it’d be helpful to consolidate my responses somewhere. At the very least, it’ll save me from repeating myself. If you’re thinking about writing online but aren’t...

3 months ago • 9 min read

Improving Recommender Systems & Search in the Age of LLMs

Hey friends, I spent the last few months researching how two streams of my work—recommendation systems and language modeling—have been converging. Starting with Word2vec for item embeddings and retrieval, followed by GRUs, Transformers, and BERT for sequential recommendations. As we see in this writeup, the current age of LLMs is no different... I appreciate you receiving this, but if you want to stop, simply unsubscribe. • • • 👉 Read in browser for best experience (web version has extras &...

4 months ago • 32 min read

Building AI Reading Club: Features & Behind the Scenes

Hi friends, How can AI make reading more enjoyable? What would an AI-powered reading experience look like? Inspired by a discussion between Andrej Karpathy and Patrick Collison, I built a simple prototype to explore some ideas. (Try it at AiReadingClub.com!) In this write-up, I’ll walk through key features, design considerations, and how it was built. I appreciate you receiving this, but if you want to stop, simply unsubscribe. • • • 👉 Read in browser for best experience (web version has...

6 months ago • 7 min read

2024 Year in Review

Hi friends, 2024 was a peaceful year of steady progress. With regard to my craft, the prototypes of 2023 were scaled and put into production, and I rediscovered the joy of building in public. On the personal side, I continued the prior year’s focus on health, further improving my diet and exercise habits, leading to measurable results. Past years: 2020, 2021, 2022, 2023 I appreciate you receiving this, but if you want to stop, simply unsubscribe. • • • 👉 Read in browser for best experience...

7 months ago • 4 min read

A Spark of the Anti-AI Butlerian Jihad (on Bluesky)

Hi friends, Recently, a dataset of 1M Bluesky posts unexpectedly sparked backlash from the Bluesky community. This incident uncovered strong anti-AI sentiment among Bluesky accounts, leaving the AI community feeling unwelcome on Bluesky. This write-up reflects on what happened, hypotheses on why it happened, and how the data/AI community responded. I appreciate you receiving this, but if you want to stop, simply unsubscribe. • • • 👉 Read in browser for best experience (web version has extras...

7 months ago • 5 min read

Some Paradoxical Rules of Writing

Hey friends, A very short email this week. I've been thinking a lot about writing lately and I've come to realize that there are a lot of rules, and at the same time no rules, about writing. I appreciate you receiving this, but if you want to stop, simply unsubscribe. • • • use simple words to be clear and concise use complex words to be sharp and precise write short sentences for the punch write long sentences to convey nuance spend 80% effort on an intro that hooks skip the intro, just get...

7 months ago • 1 min read