Simon Willison's Weblog: predictions

One Human + One Agent = One Browser From Scratch

2026-01-27T16:58:08+00:00

One Human + One Agent = One Browser From Scratch

embedding-shapes was so infuriated by the hype around Cursor's FastRender browser project - thousands of parallel agents producing ~1.6 million lines of Rust - that they were inspired to take a go at building a web browser using coding agents themselves.

The result is one-agent-one-browser and it's really impressive. Over three days they drove a single Codex CLI agent to build 20,000 lines of Rust that successfully renders HTML+CSS with no Rust crate dependencies at all - though it does (reasonably) use Windows, macOS and Linux system frameworks for image and text rendering.

I installed the 1MB macOS binary release and ran it against my blog:

chmod 755 ~/Downloads/one-agent-one-browser-macOS-ARM64 
~/Downloads/one-agent-one-browser-macOS-ARM64 https://simonwillison.net/

Here's the result:

It even rendered my SVG feed subscription icon! A PNG image is missing from the page, which looks like an intermittent bug (there's code to render PNGs).

The code is pretty readable too - here's the flexbox implementation.

I had thought that "build a web browser" was the ideal prompt to really stretch the capabilities of coding agents - and that it would take sophisticated multi-agent harnesses (as seen in the Cursor project) and millions of lines of code to achieve.

Turns out one agent driven by a talented engineer, three days and 20,000 lines of Rust is enough to get a very solid basic renderer working!

I'm going to upgrade my prediction for 2029: I think we're going to get a production-grade web browser built by a small team using AI assistance by then.

Via Show Hacker News

Tags: browsers, predictions, ai, rust, generative-ai, llms, ai-assisted-programming, coding-agents, codex-cli, browser-challenge

LLM predictions for 2026, shared with Oxide and Friends

2026-01-08T19:42:13+00:00

I joined a recording of the Oxide and Friends podcast on Tuesday to talk about 1, 3 and 6 year predictions for the tech industry. This is my second appearance on their annual predictions episode, you can see my predictions from January 2025 here. Here's the page for this year's episode, with options to listen in all of your favorite podcast apps or directly on YouTube.

Bryan Cantrill started the episode by declaring that he's never been so unsure about what's coming in the next year. I share that uncertainty - the significant advances in coding agents just in the last two months have left me certain that things will change significantly, but unclear as to what those changes will be.

Here are the predictions I shared in the episode.

1 year: It will become undeniable that LLMs write good code ▶ 19:27

I think that there are still people out there who are convinced that LLMs cannot write good code. Those people are in for a very nasty shock in 2026. I do not think it will be possible to get to the end of even the next three months while still holding on to that idea that the code they write is all junk and it's it's likely any decent human programmer will write better code than they will.

In 2023, saying that LLMs write garbage code was entirely correct. For most of 2024 that stayed true. In 2025 that changed, but you could be forgiven for continuing to hold out. In 2026 the quality of LLM-generated code will become impossible to deny.

I base this on my own experience - I've spent more time exploring AI-assisted programming than most.

The key change in 2025 (see my overview for the year) was the introduction of "reasoning models" trained specifically against code using Reinforcement Learning. The major labs spent a full year competing with each other on who could get the best code capabilities from their models, and that problem turns out to be perfectly attuned to RL since code challenges come with built-in verifiable success conditions.

Since Claude Opus 4.5 and GPT-5.2 came out in November and December respectively the amount of code I've written by hand has dropped to a single digit percentage of my overall output. The same is true for many other expert programmers I know.

At this point if you continue to argue that LLMs write useless code you're damaging your own credibility.

1 year: We're finally going to solve sandboxing ▶ 20:05

I think this year is the year we're going to solve sandboxing. I want to run code other people have written on my computing devices without it destroying my computing devices if it's malicious or has bugs. [...] It's crazy that it's 2026 and I still pip install random code and then execute it in a way that it can steal all of my data and delete all my files. [...] I don't want to run a piece of code on any of my devices that somebody else wrote outside of sandbox ever again.

This isn't just about LLMs, but it becomes even more important now there are so many more people writing code often without knowing what they're doing. Sandboxing is also a key part of the battle against prompt injection.

We have a lot of promising technologies in play already for this - containers and WebAssembly being the two I'm most optimistic about. There's real commercial value involved in solving this problem. The pieces are there, what's needed is UX work to reduce the friction in using them productively and securely.

1 year: A "Challenger disaster" for coding agent security ▶ 21:21

I think we're due a Challenger disaster with respect to coding agent security[...] I think so many people, myself included, are running these coding agents practically as root, right? We're letting them do all of this stuff. And every time I do it, my computer doesn't get wiped. I'm like, "oh, it's fine". [...] The worst version of this is the worm - a prompt injection worm which infects people's computers and adds itself to the Python or NPM packages that person has access to.

I used this as an opportunity to promote my favourite recent essay about AI security, the Normalization of Deviance in AI by Johann Rehberger.

The Normalization of Deviance describes the phenomenon where people and organizations get used to operating in an unsafe manner because nothing bad has happened to them yet, which can result in enormous problems (like the 1986 Challenger disaster) when their luck runs out.

Every six months I predict that a headline-grabbing prompt injection attack is coming soon, and every six months it doesn't happen. This is my most recent version of that prediction!

1 year: Kākāpō parrots will have an outstanding breeding season ▶ 50:06

(I dropped this one to lighten the mood after a discussion of the deep sense of existential dread that many programmers are feeling right now!)

I think that Kākāpō parrots in New Zealand are going to have an outstanding breeding season. The reason I think this is that the Rimu trees are in fruit right now. There's only 250 of them, and they only breed if the Rimu trees have a good fruiting. The Rimu trees have been terrible since 2019, but this year the Rimu trees were all blooming. There are researchers saying that all 87 females of breeding age might lay an egg. And for a species with only 250 remaining parrots that's great news.

(I just checked Wikipedia and I was right with the parrot numbers but wrong about the last good breeding season, apparently 2022 was a good year too.)

In a year with precious little in the form of good news I am utterly delighted to share this story. Here's more:

Kākāpō breeding season 2026 introduction from the Department of Conservation from June 2025 .
Bumper breeding season for kākāpō on the cards - 3rd December 2025, University of Auckland.

I don't often use AI-generated images on this blog, but the Kākāpō image the Oxide team created for this episode is just perfect:

3 years: the coding agents Jevons paradox for software engineering will resolve, one way or the other ▶ 54:37

We will find out if the Jevons paradox saves our careers or not. This is a big question that anyone who's a software engineer has right now: we are driving the cost of actually producing working code down to a fraction of what it used to cost. Does that mean that our careers are completely devalued and we all have to learn to live on a tenth of our incomes, or does it mean that the demand for software, for custom software goes up by a factor of 10 and now our skills are even more valuable because you can hire me and I can build you 10 times the software I used to be able to? I think by three years we will know for sure which way that one went.

The quote says it all. There are two ways this coding agents thing could go: it could turn out software engineering skills are devalued, or it could turn out we're more valuable and effective than ever before.

I'm crossing my fingers for the latter! So far it feels to me like it's working out that way.

3 years: Someone will build a new browser using mainly AI-assisted coding and it won't even be a surprise ▶ 65:13

I think somebody will have built a full web browser mostly using AI assistance, and it won't even be surprising. Rolling a new web browser is one of the most complicated software projects I can imagine[...] the cheat code is the conformance suites. If there are existing tests that it'll get so much easier.

A common complaint today from AI coding skeptics is that LLMs are fine for toy projects but can't be used for anything large and serious.

I think within 3 years that will be comprehensively proven incorrect, to the point that it won't even be controversial anymore.

I picked a web browser here because so much of the work building a browser involves writing code that has to conform to an enormous and daunting selection of both formal tests and informal websites-in-the-wild.

Coding agents are really good at tasks where you can define a concrete goal and then set them to work iterating in that direction.

A web browser is the most ambitious project I can think of that leans into those capabilities.

6 years: Typing code by hand will go the way of punch cards ▶ 80:39

I think the job of being paid money to type code into a computer will go the same way as punching punch cards [...] in six years time, I do not think anyone will be paid to just to do the thing where you type the code. I think software engineering will still be an enormous career. I just think the software engineers won't be spending multiple hours of their day in a text editor typing out syntax.

The more time I spend on AI-assisted programming the less afraid I am for my job, because it turns out building software - especially at the rate it's now possible to build - still requires enormous skill, experience and depth of understanding.

The skills are changing though! Being able to read a detailed specification and transform it into lines of code is the thing that's being automated away. What's left is everything else, and the more time I spend working with coding agents the larger that "everything else" becomes.

Tags: predictions, sandboxing, ai, kakapo, generative-ai, llms, ai-assisted-programming, oxide, bryan-cantrill, coding-agents, conformance-suites, browser-challenge, deep-blue, november-2025-inflection

Prediction: AI will make formal verification go mainstream

2025-12-09T03:11:19+00:00

Prediction: AI will make formal verification go mainstream

Martin Kleppmann makes the case for formal verification languages (things like Dafny, Nagini, and Verus) to finally start achieving more mainstream usage. Code generated by LLMs can benefit enormously from more robust verification, and LLMs themselves make these notoriously difficult systems easier to work with.

The paper Can LLMs Enable Verification in Mainstream Programming? by JetBrains Research in March 2025 found that Claude 3.5 Sonnet saw promising results for the three languages I listed above.

Via lobste.rs

Tags: predictions, programming-languages, ai, generative-ai, llms, ai-assisted-programming, martin-kleppmann

My AI/LLM predictions for the next 1, 3 and 6 years, for Oxide and Friends

2025-01-10T01:43:16+00:00

The Oxide and Friends podcast has an annual tradition of asking guests to share their predictions for the next 1, 3 and 6 years. Here's 2022, 2023 and 2024. This year they invited me to participate. I've never been brave enough to share any public predictions before, so this was a great opportunity to get outside my comfort zone!

We recorded the episode live using Discord on Monday. It's now available on YouTube and in podcast form.

Here are my predictions, written up here in a little more detail than the stream of consciousness I shared on the podcast.

I should emphasize that I find the very idea of trying to predict AI/LLMs over a multi-year period to be completely absurd! I can't predict what's going to happen a week from now, six years is a different universe.

With that disclaimer out of the way, here's an expanded version of what I said.

One year: Agents fail to happen, again

I wrote about how “Agents” still haven’t really happened yet in my review of Large Language Model developments in 2024.

I think we are going to see a lot more froth about agents in 2025, but I expect the results will be a great disappointment to most of the people who are excited about this term. I expect a lot of money will be lost chasing after several different poorly defined dreams that share that name.

What are agents anyway? Ask a dozen people and you'll get a dozen slightly different answers - I collected and then AI-summarized a bunch of those here.

For the sake of argument, let's pick a definition that I can predict won't come to fruition: the idea of an AI assistant that can go out into the world and semi-autonomously act on your behalf. I think of this as the travel agent definition of agents, because for some reason everyone always jumps straight to flight and hotel booking and itinerary planning when they describe this particular dream.

Having the current generation of LLMs make material decisions on your behalf - like what to spend money on - is a really bad idea. They're too unreliable, but more importantly they are too gullible.

If you're going to arm your AI assistant with a credit card and set it loose on the world, you need to be confident that it's not going to hit "buy" on the first website that claims to offer the best bargains!

I'm confident that reliability is the reason we haven't seen LLM-powered agents that have taken off yet, despite the idea attracting a huge amount of buzz since right after ChatGPT first came out.

I would be very surprised if any of the models released over the next twelve months had enough of a reliability improvement to make this work. Solving gullibility is an astonishingly difficult problem.

(I had a particularly spicy rant about how stupid the idea of sending a "digital twin" to a meeting on your behalf is.)

One year: ... except for code and research assistants

There are two categories of "agent" that I do believe in, because they're proven to work already.

The first is coding assistants - where an LLM writes, executes and then refines computer code in a loop.

I first saw this pattern demonstrated by OpenAI with their Code Interpreter feature for ChatGPT, released back in March/April of 2023.

You can ask ChatGPT to solve a problem that can use Python code and it will write that Python, execute it in a secure sandbox (I think it's Kubernetes) and then use the output - or any error messages - to determine if the goal has been achieved.

It's a beautiful pattern that worked great with early 2023 models (I believe it first shipped using original GPT-4), and continues to work today.

Claude added their own version in October (Claude analysis, using JavaScript that runs in the browser), Mistral have it, Gemini has a version and there are dozens of other implementations of the same pattern.

The second category of agents that I believe in is research assistants - where an LLM can run multiple searches, gather information and aggregate that into an answer to a question or write a report.

Perplexity and ChatGPT Search have both been operating in this space for a while, but by far the most impressive implementation I've seen is Google Gemini's Deep Research tool, which I've had access to for a few weeks.

With Deep Research I can pose a question like this one:

Pillar Point Harbor is one of the largest communal brown pelican roosts on the west coast of North America.

find others

And Gemini will draft a plan, consult dozens of different websites via Google Search and then assemble a report (with all-important citations) describing what it found.

Here's the plan it came up with:

Pillar Point Harbor is one of the largest communal brown pelican roosts on the west coast of North America. Find other large communal brown pelican roosts on the west coast of North America.
(1) Find a list of brown pelican roosts on the west coast of North America.
(2) Find research papers or articles about brown pelican roosts and their size.
(3) Find information from birdwatching organizations or government agencies about brown pelican roosts.
(4) Compare the size of the roosts found in (3) to the size of the Pillar Point Harbor roost.
(5) Find any news articles or recent reports about brown pelican roosts and their populations.

It dug up a whole bunch of details, but the one I cared most about was these PDF results for the 2016-2019 Pacific Brown Pelican Survey conducted by the West Coast Audubon network and partners - a PDF that included this delightful list:

Top 10 Megaroosts (sites that traditionally host >500 pelicans) with average fall count numbers:

Alameda Breakwater, CA (3,183)

Pillar Point Harbor, CA (1,481)

East Sand Island, OR (1,121)

Ano Nuevo State Park, CA (1,068)

Salinas River mouth, CA (762)

Bolinas Lagoon, CA (755)

Morro Rock, CA (725)

Moss landing, CA (570)

Crescent City Harbor, CA (514)

Bird Rock Tomales, CA (514)

My local harbor is the second biggest megaroost!

It makes intuitive sense to me that this kind of research assistant can be built on our current generation of LLMs. They're competent at driving tools, they're capable of coming up with a relatively obvious research plan (look for newspaper articles and research papers) and they can synthesize sensible answers given the right collection of context gathered through search.

Google are particularly well suited to solving this problem: they have the world's largest search index and their Gemini model has a 2 million token context. I expect Deep Research to get a whole lot better, and I expect it to attract plenty of competition.

Three years: Someone wins a Pulitzer for AI-assisted investigative reporting

I went for a bit of a self-serving prediction here: I think within three years someone is going to win a Pulitzer prize for a piece of investigative reporting that was aided by generative AI tools.

Update: after publishing this piece I learned about this May 2024 story from Nieman Lab: For the first time, two Pulitzer winners disclosed using AI in their reporting. I think these were both examples of traditional machine learning as opposed to LLM-based generative AI, but this is yet another example of my predictions being less ambitious than I had thought!

I do not mean that an LLM will write the article! I continue to think that having LLMs write on your behalf is one of the least interesting applications of these tools.

I called this prediction self-serving because I want to help make this happen! My Datasette suite of open source tools for data journalism has been growing AI features, like LLM-powered data enrichments and extracting structured data into tables from unstructured text.

My dream is for those tools - or tools like them - to be used for an award winning piece of investigative reporting.

I picked three years for this because I think that's how long it will take for knowledge of how to responsibly and effectively use these tools to become widespread enough for that to happen.

LLMs are not an obvious fit for journalism: journalists look for the truth, and LLMs are notoriously prone to hallucination and making things up. But journalists are also really good at extracting useful information from potentially untrusted sources - that's a lot of what the craft of journalism is about.

The two areas I think LLMs are particularly relevant to journalism are:

Structured data extraction. If you have 10,000 PDFs from a successful Freedom of Information Act request, someone or something needs to kick off the process of reading through them to find the stories. LLMs are a fantastic way to take a vast amount of information and start making some element of sense from it. They can act as lead generators, helping identify the places to start looking more closely.
Coding assistance. Writing code to help analyze data is a huge part of modern data journalism - from SQL queries through data cleanup scripts, custom web scrapers or visualizations to help find signal among the noise. Most newspapers don't have a team of programmers on staff: I think within three years we'll have robust enough tools built around this pattern that non-programmer journalists will be able to use them as part of their reporting process.

I hope to build some of these tools myself!

So my concrete prediction for three years is that someone wins a Pulitzer with a small amount of assistance from LLMs.

My more general prediction: within three years it won't be surprising at all to see most information professionals use LLMs as part of their daily workflow, in increasingly sophisticated ways. We'll know exactly what patterns work and how best to explain them to people. These skills will become widespread.

Three years part two: privacy laws with teeth

My other three year prediction concerned privacy legislation.

The levels of (often justified) paranoia around both targeted advertising and what happens to the data people paste into these models is a constantly growing problem.

I wrote recently about the inexterminable conspiracy theory that Apple target ads through spying through your phone's microphone. I've written in the past about the AI trust crisis, where people refuse to believe that models are not being trained on their inputs no matter how emphatically the companies behind them deny it.

I think the AI industry itself would benefit enormously from legislation that helps clarify what's going on with training on user-submitted data, and the wider tech industry could really do with harder rules around things like data retention and targeted advertising.

I don't expect the next four years of US federal government to be effective at passing legislation, but I expect we'll see privacy legislation with sharper teeth emerging at the state level or internationally. Let's just hope we don't end up with a new generation of cookie-consent banners as a result!

Six years utopian: amazing art

For six years I decided to go with two rival predictions, one optimistic and one pessimistic.

I think six years is long enough that we'll figure out how to harness this stuff to make some really great art.

I don't think generative AI for art - images, video and music - deserves nearly the same level of respect as a useful tool as text-based LLMs. Generative art tools are a lot of fun to try out but the lack of fine-grained control over the output greatly limits its utility outside of personal amusement or generating slop.

More importantly, they lack social acceptability. The vibes aren't good. Many talented artists have loudly rejected the idea of these tools, to the point that the very term "AI" is developing a distasteful connotation in society at large.

Image and video models are also ground zero for the AI training data ethics debate, and for good reason: no artist wants to see a model trained on their work without their permission that then directly competes with them!

I think six years is long enough for this whole thing to shake out - for society to figure out acceptable ways of using these tools to truly elevate human expression. What excites me is the idea of truly talented, visionary creative artists using whatever these tools have evolved into in six years to make meaningful art that could never have been achieved without them.

On the podcast I talked about Everything Everywhere All at Once, a film that deserved every one of its seven Oscars. The core visual effects team on that film was just five people. Imagine what a team like that could do with the generative AI tools we'll have in six years time!

Since recording the podcast I learned from Swyx that Everything Everywhere All at Once used Runway ML as part of their toolset already:

Evan Halleck was on this team, and he used Runway's AI tools to save time and automate tedious aspects of editing. Specifically in the film’s rock scene, he used Runway’s rotoscoping tool to get a quick, clean cut of the rocks as sand and dust were moving around the shot. This translated days of work to a matter of minutes.

I said I thought a film that had used generative AI tools would win an Oscar within six years. Looks like I was eight years out on that one!

Six years dystopian: AGI/ASI causes mass civil unrest

My pessimistic alternative take for 2031 concerns "AGI" - a term which, like "agents", is constantly being redefined. The Information recently reported (see also The Verge) that Microsoft and OpenAI are now defining AGI as a system capable of generating $100bn in profit!

If we assume AGI is the point at which AI systems are capable of performing almost any job currently reserved for a human being it's hard not to see potentially negative consequences.

Sam Altman may have experimented with Universal Basic Income, but the USA is a country that can't even figure out universal healthcare! I have huge trouble imagining a future economy that works for the majority of people when the majority of jobs are being done by machines.

So my dystopian prediction for 2031 is that if that form of AGI has come to pass it will be accompanied by extraordinarily bad economic outcomes and mass civil unrest.

My version of an AI utopia is tools that augment existing humans. That's what we've had with LLMs so far, and my ideal is that those tools continue to improve and subsequently humans become able to take on more ambitious work.

If there's a version of AGI that results in that kind of utopia, I'm all for it.

My total lack of conviction

There's a reason I haven't made predictions like this before: my confidence in my ability to predict the future is almost non-existent. At least one of my predictions here already proved to be eight years late!

These predictions are in the public record now (I even submitted a pull request).

It's going to be interesting looking back at these in one, three and six years to see how I did.

Tags: data-journalism, predictions, ai, openai, generative-ai, llms, ai-assisted-programming, gemini, code-interpreter, oxide, ai-agents, deep-research, ai-assisted-search, coding-agents, agent-definitions

Quoting Sam Bowman

2023-04-05T03:32:11+00:00

Scaling laws allow us to precisely predict some coarse-but-useful measures of how capable future models will be as we scale them up along three dimensions: the amount of data they are fed, their size (measured in parameters), and the amount of computation used to train them (measured in FLOPs). [...] Our ability to make this kind of precise prediction is unusual in the history of software and unusual even in the history of modern AI research. It is also a powerful tool for driving investment since it allows R&D teams to propose model-training projects costing many millions of dollars, with reasonable confidence that these projects will succeed at producing economically valuable systems.

— Sam Bowman

Tags: predictions, ai, generative-ai, llms

Quoting Tim Bray

2008-01-03T13:08:59+00:00

The strain due to the fact that most business desktops are locked into the Microsoft platform, at a time when both the Apple and GNU/Linux alternatives are qualitatively safer, better, and cheaper to operate, will start to become impossible to ignore.

— Tim Bray

Tags: apple, linux, macos, microsoft, predictions, tim-bray, windows