Simon Willison's Weblog: podcast-appearances

The cognitive impact of coding agents

2026-04-03T23:57:04+00:00

A fun thing about recording a podcast with a professional like Lenny Rachitsky is that his team know how to slice the resulting video up into TikTok-sized short form vertical videos. Here's one he shared on Twitter today which ended up attracting over 1.1m views!

That was 48 seconds. Our full conversation lasted 1 hour 40 minutes.

Tags: ai, generative-ai, llms, podcast-appearances, ai-ethics, coding-agents, cognitive-debt, agentic-engineering

Highlights from my conversation about agentic engineering on Lenny's Podcast

2026-04-02T20:40:47+00:00

I was a guest on Lenny Rachitsky's podcast, in a new episode titled An AI state of the union: We've passed the inflection point, dark factories are coming, and automation timelines. It's available on YouTube, Spotify, and Apple Podcasts. Here are my highlights from our conversation, with relevant links.

The November inflection point

4:19 - The end result of these two labs throwing everything they had at making their models better at code is that in November we had what I call the inflection point where GPT 5.1 and Claude Opus 4.5 came along.

They were both incrementally better than the previous models, but in a way that crossed a threshold where previously the code would mostly work, but you had to pay very close attention to it. And suddenly we went from that to... almost all of the time it does what you told it to do, which makes all of the difference in the world.

Now you can spin up a coding agent and say, build me a Mac application that does this thing, and you'll get something back which won't just be a buggy pile of rubbish that doesn't do anything.

Software engineers as bellwethers for other information workers

5:49 - I can churn out 10,000 lines of code in a day. And most of it works. Is that good? Like, how do we get from most of it works to all of it works? There are so many new questions that we're facing, which I think makes us a bellwether for other information workers.

Code is easier than almost every other problem that you pose these agents because code is obviously right or wrong - either it works or it doesn't work. There might be a few subtle hidden bugs, but generally you can tell if the thing actually works.

If it writes you an essay, if it prepares a lawsuit for you, it's so much harder to derive if it's actually done a good job, and to figure out if it got things right or wrong. But it's happening to us as software engineers. It came for us first.

And we're figuring out, OK, what do our careers look like? How do we work as teams when part of what we did that used to take most of the time doesn't take most of the time anymore? What does that look like? And it's going to be very interesting seeing how this rolls out to other information work in the future.

Lawyers are falling for this really badly. The AI hallucination cases database is up to 1,228 cases now!

Plus this bit from the cold open at the start:

It used to be you'd ask ChatGPT for some code, and it would spit out some code, and you'd have to run it and test it. The coding agents take that step for you now. And an open question for me is how many other knowledge work fields are actually prone to these agent loops?

Writing code on my phone

8:19 - I write so much of my code on my phone. It's wild. I can get good work done walking the dog along the beach, which is delightful.

I mainly use the Claude iPhone app for this, both with a regular Claude chat session (which can execute code now) or using it to control Claude Code for web.

Responsible vibe coding

9:55 If you're vibe coding something for yourself, where the only person who gets hurt if it has bugs is you, go wild. That's completely fine. The moment you ship your vibe coding code for other people to use, where your bugs might actually harm somebody else, that's when you need to take a step back.

Dark Factories and StrongDM

12:49 The reason it's called the dark factory is there's this idea in factory automation that if your factory is so automated that you don't need any people there, you can turn the lights off. Like the machines can operate in complete darkness if you don't need people on the factory floor. What does that look like for software? [...]

So there's this policy that nobody writes any code: you cannot type code into a computer. And honestly, six months ago, I thought that was crazy. And today, probably 95% of the code that I produce, I didn't type myself. That world is practical already because the latest models are good enough that you can tell them to rename that variable and refactor and add this line there... and they'll just do it - it's faster than you typing on the keyboard yourself.

The next rule though, is nobody reads the code. And this is the thing which StrongDM started doing last year.

I wrote a lot more about StrongDM's dark factory explorations back in February.

The bottleneck has moved to testing

21:27 - It used to be, you'd come up with a spec and you hand it to your engineering team. And three weeks later, if you're lucky, they'd come back with an implementation. And now that maybe takes three hours, depending on how well the coding agents are established for that kind of thing. So now what, right? Now, where else are the bottlenecks?

Anyone who's done any product work knows that your initial ideas are always wrong. What matters is proving them, and testing them.

We can test things so much faster now because we can build workable prototypes so much quicker. So there's an interesting thing I've been doing in my own work where any feature that I want to design, I'll often prototype three different ways it could work because that takes very little time.

I've always loved prototyping things, and prototyping is even more valuable now.

22:40 - A UI prototype is free now. ChatGPT and Claude will just build you a very convincing UI for anything that you describe. And that's how you should be working. I think anyone who's doing product design and isn't vibe coding little prototypes is missing out on the most powerful boost that we get in that step.

But then what do you do? Given your three options that you have instead of one option, how do you prove to yourself which one of those is the best? I don't have a confident answer to that. I expect this is where the good old fashioned usability testing comes in.

This stuff is exhausting

26:25 - I'm finding that using coding agents well is taking every inch of my 25 years of experience as a software engineer, and it is mentally exhausting. I can fire up four agents in parallel and have them work on four different problems. And by like 11 AM, I am wiped out for the day. [...]

There's a personal skill we have to learn in finding our new limits - what's a responsible way for us not to burn out.

I've talked to a lot of people who are losing sleep because they're like, my coding agents could be doing work for me. I'm just going to stay up an extra half hour and set off a bunch of extra things... and then waking up at four in the morning. That's obviously unsustainable. [...]

There's an element of sort of gambling and addiction to how we're using some of these tools.

Interruptions cost a lot less now

45:16 - People talk about how important it is not to interrupt your coders. Your coders need to have solid two to four hour blocks of uninterrupted work so they can spin up their mental model and churn out the code. That's changed completely. My programming work, I need two minutes every now and then to prompt my agent about what to do next. And then I can do the other stuff and I can go back. I'm much more interruptible than I used to be.

My ability to estimate software is broken

28:19 - I've got 25 years of experience in how long it takes to build something. And that's all completely gone - it doesn't work anymore because I can look at a problem and say that this is going to take two weeks, so it's not worth it. And now it's like... maybe it's going to take 20 minutes because the reason it would have taken two weeks was all of the sort of crufty coding things that the AI is now covering for us.

I constantly throw tasks at AI that I don't think it'll be able to do because every now and then it does it. And when it doesn't do it, you learn, right? But when it does do something, especially something that the previous models couldn't do, that's actually cutting edge AI research.

And a related anecdote:

36:56 - A lot of my friends have been talking about how they have this backlog of side projects, right? For the last 10, 15 years, they've got projects they never quite finished. And some of them are like, well, I've done them all now. Last couple of months, I just went through and every evening I'm like, let's take that project and finish it. And they almost feel a sort of sense of loss at the end where they're like, well, okay, my backlog's gone. Now what am I going to build?

It's tough for people in the middle

29:29 - So ThoughtWorks, the big IT consultancy, did an offsite about a month ago, and they got a whole bunch of engineering VPs in from different companies to talk about this stuff. And one of the interesting theories they came up with is they think this stuff is really good for experienced engineers, like it amplifies their skills. It's really good for new engineers because it solves so many of those onboarding problems. The problem is the people in the middle. If you're mid-career, if you haven't made it to sort of super senior engineer yet, but you're not sort of new either, that's the group which is probably in the most trouble right now.

I mentioned Cloudflare hiring 1,000 interns, and Shopify too.

Lenny asked for my advice for people stuck in that middle:

31:21 - That's a big responsibility you're putting on me there! I think the way forward is to lean into this stuff and figure out how do I help this make me better?

A lot of people worry about skill atrophy: if the AI is doing it for you, you're not learning anything. I think if you're worried about that, you push back at it. You have to be mindful about how you're applying the technology and think, okay, I've been given this thing that can answer any question and often gets it right. How can I use this to amplify my own skills, to learn new things, to take on much more ambitious projects? [...]

33:05 - Everything is changing so fast right now. The only universal skill is being able to roll with the changes. That's the thing that we all need.

The term that comes up most in these conversations about how you can be great with AI is agency. I think agents have no agency at all. I would argue that the one thing AI can never have is agency because it doesn't have human motivations.

So I'd say that's the thing is to invest in your own agency and invest in how to use this technology to get better at what you do and to do new things.

It's harder to evaluate software

The fact that it's so easy to create software with detailed documentation and robust tests means it's harder to figure out what's a credible project.

37:47 Sometimes I'll have an idea for a piece of software, Python library or whatever, and I can knock it out in like an hour and get to a point where it's got documentation and tests and all of those things, and it looks like the kind of software that previously I'd have spent several weeks on - and I can stick it up on GitHub

And yet... I don't believe in it. And the reason I don't believe in it is that I got to rush through all of those things... I think the quality is probably good, but I haven't spent enough time with it to feel confident in that quality. Most importantly, I haven't used it yet.

It turns out when I'm using somebody else's software, the thing I care most about is I want them to have used it for months.

I've got some very cool software that I built that I've never used. It was quicker to build it than to actually try and use it!

The misconception that AI tools are easy

41:31 - Everyone's like, oh, it must be easy. It's just a chat bot. It's not easy. That's one of the great misconceptions in AI is that using these tools effectively is easy. It takes a lot of practice and it takes a lot of trying things that didn't work and trying things that did work.

Coding agents are useful for security research now

19:04 - In the past sort of three to six months, they've started being credible as security researchers, which is sending shockwaves through the security research industry.

See Thomas Ptacek: Vulnerability Research Is Cooked.

At the same time, open source projects are being bombarded with junk security reports:

20:05 - There are these people who don't know what they're doing, who are asking ChatGPT to find a security hole and then reporting it to the maintainer. And the report looks good. ChatGPT can produce a very well formatted report of a vulnerability. It's a total waste of time. It's not actually verified as being a real problem.

A good example of the right way to do this is Anthropic's collaboration with Firefox, where Anthropic's security team verified every security problem before passing them to Mozilla.

OpenClaw

Of course we had to talk about OpenClaw! Lenny had his running on a Mac Mini.

1:29:23 - OpenClaw demonstrates that people want a personal digital assistant so much that they are willing to not just overlook the security side of things, but also getting the thing running is not easy. You've got to create API keys and tokens and install stuff. It's not trivial to get set up and hundreds of thousands of people got it set up. [...]

The first line of code for OpenClaw was written on November the 25th. And then in the Super Bowl, there was an ad for AI.com, which was effectively a vaporware white labeled OpenClaw hosting provider. So we went from first line of code in November to Super Bowl ad in what? Three and a half months.

I continue to love Drew Breunig's description of OpenClaw as a digital pet:

A friend of mine said that OpenClaw is basically a Tamagotchi. It's a digital pet and you buy the Mac Mini as an aquarium.

Journalists are good at dealing with unreliable sources

In talking about my explorations of AI for data journalism through Datasette:

1:34:58 - You would have thought that AI is a very bad fit for journalism where the whole idea is to find the truth. But the flip side is journalists deal with untrustworthy sources all the time. The art of journalism is you talk to a bunch of people and some of them lie to you and you figure out what's true. So as long as the journalist treats the AI as yet another unreliable source, they're actually better equipped to work with AI than most other professions are.

The pelican benchmark

Obviously we talked about pelicans riding bicycles:

56:10 - There appears to be a very strong correlation between how good their drawing of a pelican riding a bicycle is and how good they are at everything else. And nobody can explain to me why that is. [...]

People kept on asking me, what if labs cheat on the benchmark? And my answer has always been, really, all I want from life is a really good picture of a pelican riding a bicycle. And if I can trick every AI lab in the world into cheating on benchmarks to get it, then that just achieves my goal.

59:56 - I think something people often miss is that this space is inherently funny. The fact that we have these incredibly expensive, power hungry, supposedly the most advanced computers of all time. And if you ask them to draw a pelican on a bicycle, it looks like a five-year-old drew it. That's really funny to me.

And finally, some good news about parrots

Lenny asked if I had anything else I wanted to leave listeners with to wrap up the show, so I went with the best piece of news in the world right now.

1:38:10 - There is a rare parrot in New Zealand called the Kākāpō. There are only 250 of these parrots left in the world. They are flightless nocturnal parrots - beautiful green dumpy looking things. And the good news is they're having a fantastic breeding season in 2026,

They only breed when the Rimu trees in New Zealand have a mass fruiting season, and the Rimu trees haven't done that since 2022 - so there has not been a single baby kākāpō born in four years.

This year, the Rimu trees are in fruit. The kākāpō are breeding. There have been dozens of new chicks born. It's a really, really good time. It's great news for rare New Zealand parrots and you should look them up because they're delightful.

Everyone should watch the live stream of Rakiura on her nest with two chicks!

YouTube chapters

Here's the full list of chapters Lenny's team defined for the YouTube video:

00:00: Introduction to Simon Willison
02:40: The November 2025 inflection point
08:01: What's possible now with AI coding
10:42: Vibe coding vs. agentic engineering
13:57: The dark-factory pattern
20:41: Where bottlenecks have shifted
23:36: Where human brains will continue to be valuable
25:32: Defending of software engineers
29:12: Why experienced engineers get better results
30:48: Advice for avoiding the permanent underclass
33:52: Leaning into AI to amplify your skills
35:12: Why Simon says he's working harder than ever
37:23: The market for pre-2022 human-written code
40:01: Prediction: 50% of engineers writing 95% AI code by the end of 2026
44:34: The impact of cheap code
48:27: Simon's AI stack
54:08: Using AI for research
55:12: The pelican-riding-a-bicycle benchmark
59:01: The inherent ridiculousness of AI
1:00:52: Hoarding things you know how to do
1:08:21: Red/green TDD pattern for better AI code
1:14:43: Starting projects with good templates
1:16:31: The lethal trifecta and prompt injection
1:21:53: Why 97% effectiveness is a failing grade
1:25:19: The normalization of deviance
1:28:32: OpenClaw: the security nightmare everyone is looking past
1:34:22: What's next for Simon
1:36:47: Zero-deliverable consulting
1:38:05: Good news about Kakapo parrots

Tags: ai, kakapo, generative-ai, llms, podcast-appearances, coding-agents, agentic-engineering

Highlights from my appearance on the Data Renegades podcast with CL Kao and Dori Wilson

2025-11-26T00:29:11+00:00

I talked with CL Kao and Dori Wilson for an episode of their new Data Renegades podcast titled Data Journalism Unleashed with Simon Willison.

I fed the transcript into Claude Opus 4.5 to extract this list of topics with timestamps and illustrative quotes. It did such a good job I'm using what it produced almost verbatim here - I tidied it up a tiny bit and added a bunch of supporting links.

What is data journalism and why it's the most interesting application of data analytics [02:03]

"There's this whole field of data journalism, which is using data and databases to try and figure out stories about the world. It's effectively data analytics, but applied to the world of news gathering. And I think it's fascinating. I think it is the single most interesting way to apply this stuff because everything is in scope for a journalist."
The origin story of Django at a small Kansas newspaper [02:31]

"We had a year's paid internship from university where we went to work for this local newspaper in Kansas with this chap Adrian Holovaty. And at the time we thought we were building a content management system."
Building the "Downloads Page" - a dynamic radio player of local bands [03:24]

"Adrian built a feature of the site called the Downloads Page. And what it did is it said, okay, who are the bands playing at venues this week? And then we'll construct a little radio player of MP3s of music of bands who are playing in Lawrence in this week."
Working at The Guardian on data-driven reporting projects [04:44]

"I just love that challenge of building tools that journalists can use to investigate stories and then that you can use to help tell those stories. Like if you give your audience a searchable database to back up the story that you're presenting, I just feel that's a great way of building more credibility in the reporting process."
Washington Post's opioid crisis data project and sharing with local newspapers [05:22]

"Something the Washington Post did that I thought was extremely forward thinking is that they shared [the opioid files] with other newspapers. They said, 'Okay, we're a big national newspaper, but these stories are at a local level. So what can we do so that the local newspaper and different towns can dive into that data for us?'"
NICAR conference and the collaborative, non-competitive nature of data journalism [07:00]

"It's all about trying to figure out what is the most value we can get out of this technology as an industry as a whole."

NICAR 2026
ProPublica and the Baltimore Banner as examples of nonprofit newsrooms [09:02]

"The Baltimore Banner are a nonprofit newsroom. They have a hundred employees now for the city of Baltimore. This is an enormously, it's a very healthy newsroom. They do amazing data reporting... And I believe they're almost breaking even on subscription revenue [correction, not yet], which is astonishing."
The "shower revelation" that led to Datasette - SQLite on serverless hosting [10:31]

"It was literally a shower revelation. I was in the shower thinking about serverless and I thought, 'hang on a second. So you can't use Postgres on serverless hosting, but if it's a read-only database, could you use SQLite? Could you just take that data, bake it into a blob of a SQLite file, ship that as part of the application just as another asset, and then serve things on top of that?'"
Datasette's plugin ecosystem and the vision of solving data publishing [12:36]

"In the past I've thought about it like how Pinterest solved scrapbooking and WordPress solved blogging, who's going to solve data like publishing tables full of data on the internet? So that was my original goal."
Unexpected Datasette use cases: Copenhagen electricity grid, Brooklyn Cemetery [13:59]

"Somebody was doing research on the Brooklyn Cemetery and they got hold of the original paper files of who was buried in the Brooklyn Cemetery. They digitized those, loaded the results into Datasette and now it tells the story of immigration to New York."
Bellingcat using Datasette to investigate leaked Russian food delivery data [14:40]

"It turns out the Russian FSB, their secret police, have an office that's not near any restaurants and they order food all the time. And so this database could tell you what nights were the FSB working late and what were the names and phone numbers of the FSB agents who ordered food... And I'm like, 'Wow, that's going to get me thrown out of a window.'"

Bellingcat: Food Delivery Leak Unmasks Russian Security Agents
The frustration of open source: no feedback on how people use your software [16:14]

"An endless frustration in open source is that you really don't get the feedback on what people are actually doing with it."
Open office hours on Fridays to learn how people use Datasette [16:49]

"I have an open office hours Calendly, where the invitation is, if you use my software or want to use my software, grab 25 minutes to talk to me about it. And that's been a revelation. I've had hundreds of conversations in the past few years with people."
Data cleaning as the universal complaint - 95% of time spent cleaning [17:34]

"I know every single person I talk to in data complains about the cleaning that everyone says, 'I spend 95% of my time cleaning the data and I hate it.'"
Version control problems in data teams - Python scripts on laptops without Git [17:43]

"I used to work for a large company that had a whole separate data division and I learned at one point that they weren't using Git for their scripts. They had Python scripts, littering laptops left, right and center and lots of notebooks and very little version control, which upset me greatly."
The Carpentries organization teaching scientists Git and software fundamentals [18:12]

"There's an organization called The Carpentries. Basically they teach scientists to use Git. Their entire thing is scientists are all writing code these days. Nobody ever sat them down and showed them how to use the UNIX terminal or Git or version control or write tests. We should do that."
Data documentation as an API contract problem [21:11]

"A coworker of mine said, you do realize that this should be a documented API interface, right? Your data warehouse view of your project is something that you should be responsible for communicating to the rest of the organization and we weren't doing it."
The importance of "view source" on business reports [23:21]

"If you show somebody a report, you need to have view source on those reports... somebody would say 25% of our users did this thing. And I'm thinking I need to see the query because I knew where all of the skeletons were buried and often that 25% was actually a 50%."
Fact-checking process for data reporting [24:16]

"Their stories are fact checked, no story goes out the door without someone else fact checking it and without an editor approving it. And it's the same for data. If they do a piece of data reporting, a separate data reporter has to audit those numbers and maybe even produce those numbers themselves in a separate way before they're confident enough to publish them."
Queries as first-class citizens with version history and comments [27:16]

"I think the queries themselves need to be first class citizens where like I want to see a library of queries that my team are using and each one I want to know who built it and when it was built. And I want to see how that's changed over time and be able to post comments on it."
Two types of documentation: official docs vs. temporal/timestamped notes [29:46]

"There's another type of documentation which I call temporal documentation where effectively it's stuff where you say, 'Okay, it's Friday, the 31st of October and this worked.' But the timestamp is very prominent and if somebody looks that in six months time, there's no promise that it's still going to be valid to them."
Starting an internal blog without permission - instant credibility [30:24]

"The key thing is you need to start one of these without having to ask permission first. You just one day start, you can do it in a Google Doc, right?... It gives you so much credibility really quickly because nobody else is doing it."
Building a search engine across seven documentation systems [31:35]

"It turns out, once you get a search engine over the top, it's good documentation. You just have to know where to look for it. And if you are the person who builds the search engine, you secretly control the company."
The TIL (Today I Learned) blog approach - celebrating learning basics [33:05]

"I've done TILs about 'for loops' in Bash, right? Because okay, everyone else knows how to do that. I didn't... It's a value statement where I'm saying that if you've been a professional software engineer for 25 years, you still don't know everything. You should still celebrate figuring out how to learn 'for loops' in Bash."
Coding agents like Claude Code and their unexpected general-purpose power [34:53]

"They pretend to be programming tools but actually they're basically a sort of general agent because they can do anything that you can do by typing commands into a Unix shell, which is everything."
Skills for Claude - markdown files for census data, visualization, newsroom standards [36:16]

"Imagine a markdown file for census data. Here's where to get census data from. Here's what all of the columns mean. Here's how to derive useful things from that. And then you have another skill for here's how to visualize things on a map using D3... At the Washington Post, our data standards are this and this and this."

Claude Skills are awesome, maybe a bigger deal than MCP
The absurd 2025 reality: cutting-edge AI tools use 1980s terminal interfaces [38:22]

"The terminal is now accessible to people who never learned the terminal before 'cause you don't have to remember all the commands because the LLM knows the commands for you. But isn't that fascinating that the cutting edge software right now is it's like 1980s style— I love that. It's not going to last. That's a current absurdity for 2025."
Cursor for data? Generic agent loops vs. data-specific IDEs [38:18]

"More of a notebook interface makes a lot more sense than a Claude Code style terminal 'cause a Jupyter Notebook is effectively a terminal, it's just in your browser and it can show you charts."
Future of BI tools: prompt-driven, instant dashboard creation [39:54]

"You can copy and paste a big chunk of JSON data from somewhere into [an LLM] and say build me a dashboard. And they do such a good job. Like they will just decide, oh this is a time element so we'll do a bar chart over time and these numbers feel big so we'll put those in a big green box."
Three exciting LLM applications: text-to-SQL, data extraction, data enrichment [43:06]

"LLMs are stunningly good at outputting SQL queries. Especially if you give them extra metadata about the columns. Maybe a couple of example queries and stuff."
LLMs extracting structured data from scanned PDFs at 95-98% accuracy [43:36]

"You file a freedom of information request and you get back horrifying scanned PDFs with slightly wonky angles and you have to get the data out of those. LLMs for a couple of years now have been so good at, 'here's a page of a police report, give me back JSON with the name of the arresting officer and the date of the incident and the description,' and they just do it."
Data enrichment: running cheap models in loops against thousands of records [44:36]

"There's something really exciting about the cheaper models, Gemini Flash 2.5 Lite, things like that. Being able to run those in a loop against thousands of records feels very valuable to me as well."

datasette-enrichments
Multimodal LLMs for images, audio transcription, and video processing [45:42]

"At one point I calculated that using Google's least expensive model, if I wanted to generate captions for like 70,000 photographs in my personal photo library, it would cost me like $13 or something. Wildly inexpensive."

Correction: with Gemini 1.5 Flash 8B it would cost 173.25 cents
First programming language: hated C++, loved PHP and Commodore 64 BASIC [46:54]

"I hated C++ 'cause I got my parents to buy me a book on it when I was like 15 and I did not make any progress with Borland C++ compiler... Actually, my first program language was Commodore 64 BASIC. And I did love that. Like I tried to build a database in Commodore 64 BASIC back when I was like six years old or something."
Biggest production bug: crashing The Guardian's MPs expenses site with a progress bar [47:46]

"I tweeted a screenshot of that progress bar and said, 'Hey, look, we have a progress bar.' And 30 seconds later the site crashed because I was using SQL queries to count all 17,000 documents just for this one progress bar."

Crowdsourced document analysis and MP expenses
Favorite test dataset: San Francisco's tree list, updated several times a week [48:44]

"There's 195,000 trees in this CSV file and it's got latitude and longitude and species and age when it was planted... and get this, it's updated several times a week... most working days, somebody at San Francisco City Hall updates their database of trees, and I can't figure out who."
Showrunning TV shows as a management model - transferring vision to lieutenants [50:07]

"Your job is to transfer your vision into their heads so they can go and have the meetings with the props department and the set design and all of those kinds of things... I used to sniff at the idea of a vision when I was young and stupid. And now I'm like, no, the vision really is everything because if everyone understands the vision, they can make decisions you delegate to them."

The Eleven Laws of Showrunning by Javier Grillo-Marxuach
Hot take: all executable code with business value must be in version control [52:21]

"I think it's inexcusable to have executable code that has business value that is not in version control somewhere."
Hacker News automation: GitHub Actions scraping for notifications [52:45]

"I've got a GitHub actions thing that runs a piece of software I wrote called shot-scraper that runs Playwright, that loads up a browser in GitHub actions to scrape that webpage and turn the results into JSON, which then get turned into an atom feed, which I subscribe to in NetNewsWire."
Dream project: whale detection camera with Gemini AI [53:47]

"I want to point a camera at the ocean and take a snapshot every minute and feed it into Google Gemini or something and just say, is there a whale yes or no? That would be incredible. I want push notifications when there's a whale."
Favorite podcast: Mark Steel's in Town (hyperlocal British comedy) [54:23]

"Every episode he goes to a small town in England and he does a comedy set in a local venue about the history of the town. And so he does very deep research... I love that sort of like hyperlocal, like comedy, that sort of British culture thing."

Mark Steel's in Town available episodes
Favorite fiction genre: British wizards caught up in bureaucracy [55:06]

"My favorite genre of fiction is British wizards who get caught up in bureaucracy... I just really like that contrast of like magical realism and very clearly researched government paperwork and filings."

The Laundry Files, Rivers of London, The Rook

Colophon

I used a Claude Project for the initial analysis, pasting in the HTML of the transcript since that included <span data-timestamp="425"> elements. The project uses the following custom instructions

You will be given a transcript of a podcast episode. Find the most interesting quotes in that transcript - quotes that best illustrate the overall themes, and quotes that introduce surprising ideas or express things in a particularly clear or engaging or spicy way. Answer just with those quotes - long quotes are fine.

I then added a follow-up prompt saying:

Now construct a bullet point list of key topics where each item includes the mm:ss in square braces at the end

Then suggest a very comprehensive list of supporting links I could find

Then one more follow-up:

Add an illustrative quote to every one of those key topics you identified

Here's the full Claude transcript of the analysis.

Tags: data, data-journalism, django, ai, datasette, podcast-appearances

Talk Python: Celebrating Django's 20th Birthday With Its Creators

2025-08-29T20:02:50+00:00

Talk Python: Celebrating Django's 20th Birthday With Its Creators

I recorded this podcast episode recently to celebrate Django's 20th birthday with Adrian Holovaty, Will Vincent, Jeff Triplet, and Thibaud Colas.

We didn’t know that it was a web framework. We thought it was a tool for building local newspaper websites. [...]

Django’s original tagline was ‘Web development on journalism deadlines’. That’s always been my favorite description of the project.

Tags: adrian-holovaty, django, python, podcast-appearances

Screaming in the Cloud: AI’s Security Crisis: Why Your Assistant Might Betray You

2025-08-13T17:45:58+00:00

Screaming in the Cloud: AI’s Security Crisis: Why Your Assistant Might Betray You

I recorded this podcast conversation with Corey Quinn a few weeks ago:

On this episode of Screaming in the Cloud, Corey Quinn talks with Simon Willison, founder of Datasette and creator of LLM CLI about AI’s realities versus the hype. They dive into Simon’s “lethal trifecta” of AI security risks, his prediction of a major breach within six months, and real-world use cases of his open source tools, from investigative journalism to OSINT sleuthing. Simon shares grounded insights on coding with AI, the real environmental impact, AGI skepticism, and why human expertise still matters. A candid, hype-free take from someone who truly knows the space.

This was a really fun conversation - very high energy and we covered a lot of different topics. It's about a lot more than just LLM security.

Tags: ai, prompt-injection, podcast-appearances, lethal-trifecta, corey-quinn

AI for data engineers with Simon Willison

2025-08-11T05:16:43+00:00

AI for data engineers with Simon Willison

I recorded an episode last week with Claire Giordano for the Talking Postgres podcast. The topic was "AI for data engineers" but we ended up covering an enjoyable range of different topics.

How I got started programming with a Commodore 64 - the tape drive for which inspired the name Datasette
Selfish motivations for TILs (force me to write up my notes) and open source (help me never have to solve the same problem twice)
LLMs have been good at SQL for a couple of years now. Here's how I used them for a complex PostgreSQL query that extracted alt text from my blog's images using regular expressions
Structured data extraction as the most economically valuable application of LLMs for data work
2025 has been the year of tool calling a loop ("agentic" if you like)
Thoughts on running MCPs securely - read-only database access, think about sandboxes, use PostgreSQL permissions, watch out for the lethal trifecta
Jargon guide: Agents, MCP, RAG, Tokens
How to get started learning to prompt: play with the models and "bring AI to the table" even for tasks that you don't think it can handle
"It's always a good day if you see a pelican"

Tags: postgresql, ai, generative-ai, llms, podcast-appearances

Generationship: Ep. #39, Simon Willison

2025-07-11T05:33:06+00:00

Generationship: Ep. #39, Simon Willison

I recorded this podcast episode with Rachel Chalmers a few weeks ago. We talked about the resurgence of blogging, the legacy of Google Reader, learning in public, LLMs as weirdly confident interns, AI-assisted search, prompt injection, human augmentation over replacement and we finished with this delightful aside about pelicans which I'll quote here in full:

Rachel: My last question, my favorite question. If you had a generation ship, a star ship that takes more than a human generation to get to Alpha Centauri, what would you call it?

Simon: I'd call it Squadron, because that is the collective noun for pelicans. And I love pelicans.

Rachel: Pelicans are the best.

Simon: They're the best. I live in Half Moon Bay. We have the second largest mega roost of the California brown pelican in the world, in our local harbor [...] last year we had over a thousand pelicans diving into the water at the same time at peak anchovy season or whatever it was.

The largest mega roost, because I know you want to know, is in Alameda, over by the aircraft carrier.

Rachel: The hornet.

Simon: Yeah. It's got the largest mega roost of the California brown pelican at certain times of the year. They're so photogenic. They've got charisma. They don't look properly shaped for flying.

Rachel: They look like the Spruce Goose. They've got the big front. And they look like they're made of wood.

Simon: That's such a great comparison, because I saw the Spruce Goose a couple of years ago. Up in Portland, there's this museum that has the Spruce Goose, and I went to see it. And it's incredible. Everyone makes fun of the Spruce Goose until you see the thing. And it's this colossal, beautiful wooden aircraft. Until recently it was the largest aircraft in the world. And it's such a stunning vehicle.

So yeah, pelicans and the Spruce Goose. I'm going to go with that one.

Tags: blogging, ai, generative-ai, llms, half-moon-bay, podcast-appearances

Talking AI and jobs with Natasha Zouves for News Nation

2025-05-30T13:45:57+00:00

I was interviewed by News Nation's Natasha Zouves about the very complicated topic of how we should think about AI in terms of threatening our jobs and careers. I previously talked with Natasha two years ago about Microsoft Bing.

I'll be honest: I was nervous about this one. I'm not an economist and I didn't feel confident talking about this topic!

I do find the challenge of making recent advances in AI and LLMs accessible to a general audience absolutely fascinating though, so I took the risk and agreed to the interview.

I think it came out very well. The full hour long video is now available on the News Nation YouTube channel, or as an audio podcast on iTunes or on Spotify.

I made my own transcript of the video (using MacWhisper) and fed it into the new Claude Opus 4 model to see if it could do a good job of turning that into an outline of the episode, with links to segments, short summaries and illustrative quotes. It did such a good job that I'm including it here on my blog - I very rarely publish AI-produced text of this length, but in this case I think it's justified - especially since most of it is direct quotes from things I said (and have confirmed I said) during the episode.

I ran this command (using my LLM tool):

llm -m claude-4-opus -f transcript.md -s 'Create a markdown outline list of topics covered by this talk. For each topic have a title that links to that point in the video and a single sentence paragraph summary of that section and two or three of the best illustrative quotes. The YouTube video URL is https://www.youtube.com/watch?v=RIvIpILrNXE - use that to link to the exact moments in the video.'

It cost me 23,942 input tokens and 2,973 outputs, which for Claude Opus 4 adds up to 58 cents.

Claude included the relevant timestamps from the transcript. I ended tweaking those a little to ensure they included the introductory context to the session.

The economic disruption nightmare scenario (0:46)

Simon discusses his primary concern about AI's impact on employment and the economy. He explains that while skeptical of AGI claims, he sees real job impacts already happening, particularly for information workers and programmers.

"The biggest nightmare scenario for me, or the more realistic one is the economic disruption this causes"
"If you have a job that primarily deals with handling information, this stuff is a very powerful tool to help with that. And maybe that results in job losses"
"This stuff is incredibly good at writing software, which was a huge surprise to everyone"

Jobs most vulnerable to AI: translation and information processing (2:12)

The conversation explores how jobs involving information transformation are already being affected, with translation services as a prime example. Simon explains how translators have shifted from doing translations to reviewing AI-generated work.

"Something we've seen already is jobs that are purely about transforming information from one shape to another are already being affected quite heavily"
"It's not so much that they're put out of work. It's that their job has changed from doing the translation to reviewing translations created by machines"
"Paralegals, who are assisting lawyers in going through contracts and so forth, a lot of what they do is beginning to be impacted by these tools as well"

The jagged frontier: what AI can and cannot do (3:33)

Simon introduces the concept of AI's "jagged frontier" - the unpredictable boundary between tasks AI excels at and those it fails at. He emphasizes that discovering these boundaries requires constant experimentation.

"There are things that AI is really good at and there's things that AI is terrible at, but those things are very non-obvious"
"The only way to find out if AI can do a task is to sort of push it through the AI, try it lots of different times"
"People are still finding things that it can't do, finding things that it can do, and trying to explore those edges"

AI's strength: processing and synthesizing large documents (4:16)

Simon details how AI excels at answering questions about information you provide it, making it valuable for document analysis and synthesis. He particularly highlights its surprising capability in code generation.

"You can paste in a hundred-page document and ask it questions about the information in that document"
"AI is shockingly good at writing code for computers"
"If you can describe what you need, the AI can churn out hundreds of lines of codes that do exactly that"

The hallucination problem: AI's critical weakness (5:28)

A detailed discussion of AI hallucination - when models confidently state false information. Simon provides examples including lawyers citing non-existent cases and explains why this is such a fundamental limitation.

"AI makes mistakes a lot... it feels like it's a science fiction AI that knows everything and answers instantly and always gets everything right. And it turns out that's not what they are at all"
"Really what these things are doing is they're trying to give you something that sounds convincing. They've been trained to output convincing texts, but convincing isn't the same thing as truth"
"A bunch of lawyers have got caught out where they'll in their lawsuits, they'll say, and in the case, so-and-so versus so-and-so this thing happened. And then somebody looks it up and the case didn't exist"

Customer service AI: the failed revolution (8:32)

Simon discusses Klarna's reversal on AI customer service, explaining why human customers resist AI support and the ethical concerns around disclosure.

"They announced a reversal of that. They said they're hiring humans back again... because it turns out human beings hate talking to an AI as customer support"
"I think it's deeply unethical to present a customer with an AI support bot without letting them know that it's AI"
"If you're talking to customer support, sometimes it's because you've hit an edge case... which is that the thing that you're trying to do just isn't one of those normal things that the AI have been trained on"

The trucking industry and self-driving vehicles (10:58)

A sobering discussion about the future of trucking jobs in light of advances in self-driving technology, particularly Waymo's success in San Francisco.

"I'm more nervous about that now than I was a year ago, because like self driving cars have been coming soon in the future for like over a decade"
"We now have these self driving taxis, which actually do work... They've been operating on the roads of San Francisco for a couple of years now. And they're good"
"Given how well Waymo is now working, it does feel to me like we might see functional self driving trucks at some point within the next five to 10 years"

Journalism and financial analysis: why human judgment matters (15:44)

Simon strongly defends journalism against AI replacement, explaining why human judgment and verification skills remain crucial in fields dealing with truth and trust.

"The single biggest flaw of AI is that it is gullible... they have absolutely no instincts for telling if something is true or not"
"Journalism is the art of absorbing information from a huge array of untrustworthy sources and figuring out what is the truth in amongst all of this"
"If you want to analyze 10,000 police reports and figure out what the overall trends are... If the AI can read those 10,000 things and give you leads on which ones look most interesting, it almost doesn't matter if it makes mistakes"

AI's telltale signs: the "delve" phenomenon (17:49)

An fascinating (note: Claude used "an fascinating" rather than "a fascinating", what a weird mistake!) explanation of how to spot AI-generated text, including the surprising linguistic influence of Nigerian English on AI models.

"There's this magical thing where the world delve is surprisingly common in AI generated text. If something says that it's going to delve into something, that's an instant red flag"
"A lot of that work was outsourced to people in Nigeria a couple of years ago... Nigerian English is slightly different from American English. They use the word delve a whole lot more"
"One of the thrilling things about this field is the people building this stuff don't really understand how it works"

Voice cloning and scams: the dark side of AI (21:47)

Simon discusses the serious threat of voice cloning technology and romance scams, explaining how AI makes these scams cheaper and more scalable.

"There are a lot of systems now that can clone your voice to a very high degree based on 10 to 15 seconds of samples"
"When you hear somebody on the phone with a voice, you can no longer be at all sure that that person is the person that they sound like"
"Romance scams... were being run by human beings... Now you don't even need that. The AI models are extremely good at convincing messages"

AI-proofing your career: learning and adaptation (26:52)

Simon provides practical advice for workers concerned about AI, emphasizing how AI can actually help people learn new skills more easily.

"One of the most exciting things to me personally about AI is that it reduces the barrier to entry on so many different things"
"There's never been a better time to learn to program. Because that frustration, that learning curve has been shaved down so much"
"If you're AI literate, if you can understand what these tools can do and how to apply them and you have literacy in some other field, that makes you incredibly valuable"

Safe sectors: the trades and human touch (30:01)

Discussion of jobs that are more resistant to AI disruption, particularly skilled trades and roles requiring physical presence.

"The classic example is things like plumbing. Like plumbing and HVAC... it's going to be a very long time until we have an AI plumber"
"I don't think AI eliminates many jobs. I think it greatly changes how they work"
"You could be the AI-enabled botanist who helps all of the companies that run nurseries and so forth upgrade their processes"

Creative industries: the human advantage (34:37)

Simon explains why human creativity remains valuable despite AI's capabilities, using examples from film and art.

"Novelty is the one thing that AI can't do because it's imitating the examples that it's seen already"
"If a human being with taste filtered that, if it got the AI to write 20 stories and it said, okay, this is the most interesting and then added that human flavor on top, that's the point where the thing starts to get interesting"
"I love the idea that creative people can take on more ambitious projects, can tell even better stories"

AI security and the gullibility problem (46:51)

A deep dive into the unsolved security challenges of AI systems, particularly their susceptibility to manipulation.

"We're building these systems that you can talk to and they can do things for you... And we have no idea how to make this secure"
"The AI security problem comes down to gullibility"
"They don't yet have a way of telling the difference between stuff that you tell them to do and stuff that other people tell them to do"

The global AI race and competition (52:14)

Simon discusses concerns about international AI competition and how it affects safety considerations.

"The thing that frightens me most is the competition... In the past 12 months, there are, I counted, 18 organizations that are putting out these ChatGPT style models"
"They're all competing against each other, which means they're taking shortcuts. The safety research they're paying less attention to"
"Chinese AI lab called DeepSeek came up with more optimized methods... they managed to produce a model that was as good as the OpenAI ones for like a 20th of the price"

Getting started with AI: practical tips (57:34)

Simon provides concrete advice for beginners wanting to explore AI tools safely and productively.

"The best way to learn about this stuff is to play with it, is to try and do ridiculous things with it"
"A friend of mine says you should always bring AI to the table. Like any challenge that you have, try it against the AI, even if you think it's not going to work"
"One exercise I really recommend is try and get an AI to make a mistake as early as possible... the first time you see it very confidently tell you something that's blatantly not true, it sort of inoculates you"

Tags: interviews, speaking, my-talks, ai, generative-ai, llms, podcast-appearances, ai-ethics, claude-4, macwhisper

Half Stack Data Science: Programming with AI, with Simon Willison

2025-04-01T14:27:14+00:00

Half Stack Data Science: Programming with AI, with Simon Willison

I participated in this wide-ranging 50 minute conversation with David Asboth and Shaun McGirr. Topics we covered included applications of LLMs to data journalism, the challenges of building an intuition for how best to use these tool given their "jagged frontier" of capabilities, how LLMs impact learning to program and how local models are starting to get genuinely useful now.

At 27:47:

If you're a new programmer, my optimistic version is that there has never been a better time to learn to program, because it shaves down the learning curve so much. When you're learning to program and you miss a semicolon and you bang your head against the computer for four hours [...] if you're unlucky you quit programming for good because it was so frustrating. [...]

I've always been a project-oriented learner; I can learn things by building something, and now the friction involved in building something has gone down so much [...] So I think especially if you're an autodidact, if you're somebody who likes teaching yourself things, these are a gift from heaven. You get a weird teaching assistant that knows loads of stuff and occasionally makes weird mistakes and believes in bizarre conspiracy theories, but you have 24 hour access to that assistant.

If you're somebody who prefers structured learning in classrooms, I think the benefits are going to take a lot longer to get to you because we don't know how to use these things in classrooms yet. [...]

If you want to strike out on your own, this is an amazing tool if you learn how to learn with it. So you've got to learn the limits of what it can do, and you've got to be disciplined enough to make sure you're not outsourcing the bits you need to learn to the machines.

Via @halfstackdatascience.com

Tags: data-journalism, podcasts, ai, generative-ai, llms, podcast-appearances

Notes from my Accessibility and Gen AI podcast appearance

2025-03-02T14:51:43+00:00

I was a guest on the most recent episode of the Accessibility + Gen AI Podcast, hosted by Eamon McErlean and Joe Devon. We had a really fun, wide-ranging conversation about a host of different topics. I've extracted a few choice quotes from the transcript.

LLMs for drafting alt text

I use LLMs for the first draft of my alt text (22:10):

I actually use Large Language Models for most of my alt text these days. Whenever I tweet an image or whatever, I've got a Claude project called Alt text writer. It's got a prompt and an example. I dump an image in and it gives me the alt text.

I very rarely just use it because that's rude, right? You should never dump text onto people that you haven't reviewed yourself. But it's always a good starting point.

Normally I'll edit a tiny little bit. I'll delete an unimportant detail or I'll bulk something up. And then I've got alt text that works.

Often it's actually got really good taste. A great example is if you've got a screenshot of an interface, there's a lot of words in that screenshot and most of them don't matter.

The message you're trying to give in the alt text is that it's two panels on the left, there's a conversation on the right, there's a preview of the SVG file or something. My alt text writer normally gets that right.

It's even good at summarizing tables of data where it will notice that actually what really matters is that Gemini got a score of 57 and Nova got a score of 53 - so it will pull those details out and ignore [irrelevant columns] like the release dates and so forth.

Here's the current custom instructions prompt I'm using for that Claude Project:

You write alt text for any image pasted in by the user. Alt text is always presented in a fenced code block to make it easy to copy and paste out. It is always presented on a single line so it can be used easily in Markdown images. All text on the image (for screenshots etc) must be exactly included. A short note describing the nature of the image itself should go first.

Is it ethical to build unreliable accessibility tools?

On the ethics of building accessibility tools on top of inherently unreliable technology (5:33):

Some people I've talked to have been skeptical about the accessibility benefits because their argument is that if you give somebody unreliable technology that might hallucinate and make things up, surely that's harming them.

I don't think that's true. I feel like people who use screen readers are used to unreliable technology.

You know, if you use a guide dog - it's a wonderful thing and a very unreliable piece of technology.

When you consider that people with accessibility needs have agency, they can understand the limitations of the technology they're using. I feel like giving them a tool where they can point their phone at something and it can describe it to them is a world away from accessibility technology just three or four years ago.

Why I don't feel threatened as a software engineer

This is probably my most coherent explanation yet of why I don't see generative AI as a threat to my career as a software engineer (33:49):

My perspective on this as a developer who's been using these systems on a daily basis for a couple of years now is that I find that they enhance my value. I am so much more competent and capable as a developer because I've got these tools assisting me. I can write code in dozens of new programming languages that I never learned before.

But I still get to benefit from my 20 years of experience.

Take somebody off the street who's never written any code before and ask them to build an iPhone app with ChatGPT. They are going to run into so many pitfalls, because programming isn't just about can you write code - it's about thinking through the problems, understanding what's possible and what's not, understanding how to QA, what good code is, having good taste.

There's so much depth to what we do as software engineers.

I've said before that generative AI probably gives me like two to five times productivity boost on the part of my job that involves typing code into a laptop. But that's only 10 percent of what I do. As a software engineer, most of my time isn't actually spent with the typing of the code. It's all of those other activities.

The AI systems help with those other activities, too. They can help me think through architectural decisions and research library options and so on. But I still have to have that agency to understand what I'm doing.

So as a software engineer, I don't feel threatened. My most optimistic view of this is that the cost of developing software goes down because an engineer like myself can be more ambitious, can take on more things. As a result, demand for software goes up - because if you're a company that previously would never have dreamed of building a custom CRM for your industry because it would have taken 20 engineers a year before you got any results... If it now takes four engineers three months to get results, maybe you're in the market for software engineers now that you weren't before.

Tags: accessibility, alt-text, podcasts, ai, generative-ai, llms, podcast-appearances

A selfish personal argument for releasing code as Open Source

2025-01-24T21:46:03+00:00

I'm the guest for the most recent episode of the Real Python podcast with Christopher Bailey, talking about Using LLMs for Python Development. We covered a lot of other topics as well - most notably my relationship with Open Source development over the years.

At 5m32s I presented what I think is the best version yet of my selfish personal argument for why it makes sense to default to releasing code as Open Source:

I didn't really get heavily back into open source until about maybe six years ago when I'd been working for a big company in the US, and I got frustrated that all of the code I was writing, I'd never get to use again.

I realized that one of the best things about open source software is that you can solve a problem once and then you can slap an open source license on that solution and you will never have to solve that problem ever again, no matter who's employing you in the future.

It's a sneaky way of solving a problem permanently.

Once I realized that I started open sourcing everything, like pretty much every piece of code I've written in the past six years, I've open sourced purely as a defense against me losing access to that code in the future.

Because I've written loads of code for employers that I don't get to use anymore - and how many times do you want to reinvent things?

I like to say that my interest in open source is actually really selfish. I figured something out. I never want to have to do this work ever again.

If I slap a license on it, write documentation for it so that I can remember what it does and write unit tests for it so it's easy for me to keep it working in the future, that's entirely beneficial to me.

The rest of the episode was a really great conversation - other topics we covered included:

4m40s: My first ever significant open source project - a PHP XML-RPC library that ended up in WordPress twenty years ago
10m08s: Benefits I've gained from starting a blog 22+ years ago
22m14s: How to get started using LLMs to write Python
36m55s: My workflow for using LLMs for code - for both the experimental research work (I called this the "Mise en place phase") and the follow-up where I actually write the finished code
55m14s: Why an SVG of a pelican riding a bicycle?
57m48s: Why saying "do it better" actually works!
1h0m24s: Cooking with LLMs! How to get a weirdly tasty guacamole recipe
1h08m52s: My latest thoughts on local models

Tags: open-source, podcasts, python, ai, generative-ai, llms, podcast-appearances

Simon Willison And SWYX Tell Us Where AI Is In 2025

2025-01-14T16:10:07+00:00

Simon Willison And SWYX Tell Us Where AI Is In 2025

I recorded this podcast episode with Brian McCullough and swyx riffing off my Things we learned about LLMs in 2024 review. We also touched on some predictions for the future - this is where I learned from swyx that Everything Everywhere All at Once used generative AI (Runway ML) already.

The episode is also available on YouTube:

Tags: podcasts, ai, swyx, generative-ai, llms, podcast-appearances

Simon Willison: The Future of Open Source and AI

2024-12-02T01:03:28+00:00

Simon Willison: The Future of Open Source and AI

I sat down a few weeks ago to record this conversation with Logan Kilpatrick and Nolan Fortman for their podcast Around the Prompt. The episode is available on YouTube and Apple Podcasts and other platforms.

We talked about a whole bunch of different topics, including the ongoing debate around the term "open source" when applied to LLMs and my thoughts on why I don't feel threatened by LLMs as a software engineer (at 40m05s).

Via @OfficialLoganK

Tags: open-source, podcasts, youtube, ai, generative-ai, llms, logan-kilpatrick, podcast-appearances

Notes from Bing Chat—Our First Encounter With Manipulative AI

2024-11-19T22:41:57+00:00

I participated in an Ars Live conversation with Benj Edwards of Ars Technica today, talking about that wild period of LLM history last year when Microsoft launched Bing Chat and it instantly started misbehaving, gaslighting and defaming people.

Here's the video of our conversation.

I ran the video through MacWhisper, extracted a transcript and used Claude to identify relevant articles I should link to. Here's that background information to accompany the talk.

A rough timeline of posts from that Bing launch period back in February 2023:

Microsoft announces AI-powered Bing search and Edge browser - Benj Edwards, Feb 7, 2023
AI-powered Bing Chat spills its secrets via prompt injection attack - Benj Edwards, Feb 10, 2023
AI-powered Bing Chat loses its mind when fed Ars Technica article - Benj Edwards, Feb 14, 2023
Bing: “I will not harm you unless you harm me first” - Simon Willison, Feb 15, 2023
Gareth Corfield: I'm beginning to have concerns for @benjedwards' virtual safety - Twitter, Feb 15, 2023
A Conversation With Bing’s Chatbot Left Me Deeply Unsettled - Kevin Roose, NYT, Feb 16, 2023
It is deeply unethical to give a superhuman liar the authority of a $1 trillion company or to imply that it is an accurate source of knowledge / And it is deeply manipulative to give people the impression that Bing Chat has emotions or feelings like a human - Benj on Twitter (now deleted), Feb 16 2023
Bing AI Flies Into Unhinged Rage at Journalist - Maggie Harrison Dupré, Futurism, Feb 17 2023

Other points that we mentioned:

this AI chatbot "Sidney" is misbehaving - amazing forum post from November 23, 2022 (a week before even ChatGPT had been released) from a user in India talking about their interactions with a secret preview of Bing/Sydney
Prompt injection attacks against GPT-3 - where I coined the term "prompt injection" in September 12 2022
Eight Things to Know about Large Language Models (PDF) is the paper where I first learned about sycophancy and sandbagging (in April 2023)
Claude’s Character by Anthropic talks about how they designed the personality for Claude - June 8 2023, my notes on that.
Why ChatGPT and Bing Chat are so good at making things up in which Benj argues for the term "confabulation" in April 2023.

Tags: arstechnica, bing, ethics, microsoft, podcasts, my-talks, ai, openai, generative-ai, gpt-4, llms, benj-edwards, podcast-appearances, ai-ethics, ai-assisted-search, ai-personality, ai-misuse

Gergely Orosz's edited clip of me talking about Open Source

2024-09-30T20:24:41+00:00

Gergely Orosz's edited clip of me talking about Open Source

Gergely Orosz released this clip to help promote our podcast conversation AI tools for software engineers, but without the hype - it's a neat bite-sized version of my argument for why Open Source has provided the single biggest enhancement to developer productivity I've seen in my entire career.

One of the big challenges everyone talked about was software reusability. Like, why are we writing the same software over and over again?

And at the time, people thought OOP was the answer. They were like, oh, if we do everything as classes in Java, then we can subclass those classes, and that's how we'll solve reusable software.

That wasn't the fix. The fix was open source. The fix was having a diverse and vibrant open source community releasing software that's documented and you can package and install and all of those kinds of things.

That's been incredible. The cost of building software today is a fraction of what it was 20 years ago, purely thanks to open source.

Tags: open-source, gergely-orosz, podcast-appearances

The Pragmatic Engineer Podcast: AI tools for software engineers, but without the hype – with Simon Willison

2024-09-25T17:58:46+00:00

The Pragmatic Engineer Podcast: AI tools for software engineers, but without the hype – with Simon Willison

Gergely Orosz has a brand new podcast, and I was the guest for the first episode. We covered a bunch of ground, but my favorite topic was an exploration of the (very legitimate) reasons that many engineers are resistant to taking advantage of AI-assisted programming tools.

Tags: podcasts, ai, generative-ai, llms, ai-assisted-programming, gergely-orosz, podcast-appearances

Notes on using LLMs for code

2024-09-20T03:10:57+00:00

I was recently the guest on TWIML - the This Week in Machine Learning & AI podcast. Our episode is titled Supercharging Developer Productivity with ChatGPT and Claude with Simon Willison, and the focus of the conversation was the ways in which I use LLM tools in my day-to-day work as a software developer and product engineer.

Here's the YouTube video version of the episode:

I ran the transcript through MacWhisper and extracted some edited highligts below.

Two different modes of LLM use

At 19:53:

There are two different modes that I use LLMs for with programming.

The first is exploratory mode, which is mainly quick prototyping - sometimes in programming languages I don't even know.

I love asking these things to give me options. I will often start a prompting session by saying, "I want to draw a visualization of an audio wave. What are my options for this?"

And have it just spit out five different things. Then I'll say "Do me a quick prototype of option three that illustrates how that would work."

The other side is when I'm writing production code, code that I intend to ship, then it's much more like I'm treating it basically as an intern who's faster at typing than I am.

That's when I'll say things like, "Write me a function that takes this and this and returns exactly that."

I'll often iterate on these a lot. I'll say, "I don't like the variable names you used there. Change those." Or "Refactor that to remove the duplication."

I call it my weird intern, because it really does feel like you've got this intern who is screamingly fast, and they've read all of the documentation for everything, and they're massively overconfident, and they make mistakes and they don't realize them.

But crucially, they never get tired, and they never get upset. So you can basically just keep on pushing them and say, "No, do it again. Do it differently. Change that. Change that."

At three in the morning, I can be like, "Hey, write me 100 lines of code that does X, Y, and Z," and it'll do it. It won't complain about it.

It's weird having this small army of super talented interns that never complain about anything, but that's kind of how this stuff ends up working.

Here are all of my other notes about AI-assisted programming.

Prototyping

At 25:22:

My entire career has always been about prototyping.

Django itself, the web framework, we built that in a local newspaper so that we could ship features that supported news stories faster. How can we make it so we can turn around a production-grade web application in a few days?

Ever since then, I've always been interested in finding new technologies that let me build things quicker, and my development process has always been to start with a prototype.

You have an idea, you build a prototype that illustrates the idea, you can then have a better conversation about it. If you go to a meeting with five people, and you've got a working prototype, the conversation will be so much more informed than if you go in with an idea and a whiteboard sketch.

I've always been a prototyper, but I feel like the speed at which I can prototype things in the past 12 months has gone up by an order of magnitude.

I was already a very productive prototype producer. Now, I can tap a thing into my phone, and 30 seconds later, I've got a user interface in Claude Artifacts that illustrates the idea that I'm trying to explore.

Honestly, if I didn't use these models for anything else, if I just used them for prototyping, they would still have an enormous impact on the work that I do.

Here are examples of prototypes I've built using Claude Artifacts. A lot of them end up in my tools collection.

The full conversation covers a bunch of other topics. I ran the transcript through Claude, told it "Give me a bullet point list of the most interesting topics covered in this transcript" and then deleted the ones that I didn't think were particularly interesting - here's what was left:

Using AI-powered voice interfaces like ChatGPT's Voice Mode to code while walking a dog
Leveraging AI tools like Claude and ChatGPT for rapid prototyping and development
Using AI to analyze and extract data from images, including complex documents like campaign finance reports
The challenges of using AI for tasks that may trigger safety filters, particularly for journalism
The evolution of local AI models like Llama and their improving capabilities
The potential of AI for data extraction from complex sources like scanned tables in PDFs
Strategies for staying up-to-date with rapidly evolving AI technologies
The development of vision-language models and their applications
The balance between hosted AI services and running models locally
The importance of examples in prompting for better AI performance

Tags: podcasts, ai, openai, generative-ai, chatgpt, llms, ai-assisted-programming, anthropic, claude, claude-artifacts, podcast-appearances, prompt-to-app

Supercharging Developer Productivity with ChatGPT and Claude with Simon Willison

2024-09-17T16:21:22+00:00

Supercharging Developer Productivity with ChatGPT and Claude with Simon Willison

I'm the guest for the latest episode of the TWIML AI podcast - This Week in Machine Learning & AI, hosted by Sam Charrington.

We mainly talked about how I use LLM tooling for my own work - Claude, ChatGPT, Code Interpreter, Claude Artifacts, LLM and GitHub Copilot - plus a bit about my experiments with local models.

Via @twimlai

Tags: podcasts, productivity, ai, generative-ai, chatgpt, claude, podcast-appearances

Notes from my appearance on the Software Misadventures Podcast

2024-09-10T22:48:42+00:00

I was a guest on Ronak Nathani and Guang Yang's Software Misadventures Podcast, which interviews seasoned software engineers about their careers so far and their misadventures along the way. Here's the episode: LLMs are like your weird, over-confident intern | Simon Willison (Datasette).

You can get the audio version on Overcast, on Apple Podcasts or on Spotify - or you can watch the video version on YouTube.

I ran the video through MacWhisper to get a transcript, then spent some time editing out my own favourite quotes, trying to focus on things I haven't written about previously on this blog.