Simon Willison's Weblog: Project

Wilson Lin on FastRender: a browser built by thousands of parallel agents

2026-01-23T21:26:10+00:00

Last week Cursor published Scaling long-running autonomous coding, an article describing their research efforts into coordinating large numbers of autonomous coding agents. One of the projects mentioned in the article was FastRender, a web browser they built from scratch using their agent swarms. I wanted to learn more so I asked Wilson Lin, the engineer behind FastRender, if we could record a conversation about the project. That 47 minute video is now available on YouTube. I've included some of the highlights below.

See my previous post for my notes and screenshots from trying out FastRender myself.

What FastRender can do right now

We started the conversation with a demo of FastRender loading different pages (03:15). The JavaScript engine isn't working yet so we instead loaded github.com/wilsonzlin/fastrender, Wikipedia and CNN - all of which were usable, if a little slow to display.

JavaScript had been disabled by one of the agents, which decided to add a feature flag! 04:02

JavaScript is disabled right now. The agents made a decision as they were currently still implementing the engine and making progress towards other parts... they decided to turn it off or put it behind a feature flag, technically.

From side-project to core research

Wilson started what become FastRender as a personal side-project to explore the capabilities of the latest generation of frontier models - Claude Opus 4.5, GPT-5.1, and GPT-5.2. 00:56

FastRender was a personal project of mine from, I'd say, November. It was an experiment to see how well frontier models like Opus 4.5 and back then GPT-5.1 could do with much more complex, difficult tasks.

A browser rendering engine was the ideal choice for this, because it's both extremely ambitious and complex but also well specified. And you can visually see how well it's working! 01:57

As that experiment progressed, I was seeing better and better results from single agents that were able to actually make good progress on this project. And at that point, I wanted to see, well, what's the next level? How do I push this even further?

Once it became clear that this was an opportunity to try multiple agents working together it graduated to an official Cursor research project, and available resources were amplified.

The goal of FastRender was never to build a browser to compete with the likes of Chrome. 41:52

We never intended for it to be a production software or usable, but we wanted to observe behaviors of this harness of multiple agents, to see how they could work at scale.

The great thing about a browser is that it has such a large scope that it can keep serving experiments in this space for many years to come. JavaScript, then WebAssembly, then WebGPU... it could take many years to run out of new challenges for the agents to tackle.

Running thousands of agents at once

The most interesting thing about FastRender is the way the project used multiple agents working in parallel to build different parts of the browser. I asked how many agents were running at once: 05:24

At the peak, when we had the stable system running for one week continuously, there were approximately 2,000 agents running concurrently at one time. And they were making, I believe, thousands of commits per hour.

The project has nearly 30,000 commits!

How do you run 2,000 agents at once? They used really big machines. 05:56

The simple approach we took with the infrastructure was to have a large machine run one of these multi-agent harnesses. Each machine had ample resources, and it would run about 300 agents concurrently on each. This was able to scale and run reasonably well, as agents spend a lot of time thinking, and not just running tools.

At this point we switched to a live demo of the harness running on one of those big machines (06:32). The agents are arranged in a tree structure, with planning agents firing up tasks and worker agents then carrying them out. 07:14

This cluster of agents is working towards building out the CSS aspects of the browser, whether that's parsing, selector engine, those features. We managed to push this even further by splitting out the browser project into multiple instructions or work streams and have each one run one of these harnesses on their own machine, so that was able to further parallelize and increase throughput.

But don't all of these agents working on the same codebase result in a huge amount of merge conflicts? Apparently not: 08:21

We've noticed that most commits do not have merge conflicts. The reason is the harness itself is able to quite effectively split out and divide the scope and tasks such that it tries to minimize the amount of overlap of work. That's also reflected in the code structure—commits will be made at various times and they don't tend to touch each other at the same time.

This appears to be the key trick for unlocking benefits from parallel agents: if planning agents do a good enough job of breaking up the work into non-overlapping chunks you can bring hundreds or even thousands of agents to bear on a problem at once.

Surprisingly, Wilson found that GPT-5.1 and GPT-5.2 were a better fit for this work than the coding specialist GPT-5.1-Codex: 17:28

Some initial findings were that the instructions here were more expansive than merely coding. For example, how to operate and interact within a harness, or how to operate autonomously without interacting with the user or having a lot of user feedback. These kinds of instructions we found worked better with the general models.

I asked what the longest they've seen this system run without human intervention: 18:28

So this system, once you give an instruction, there's actually no way to steer it, you can't prompt it, you're going to adjust how it goes. The only thing you can do is stop it. So our longest run, all the runs are basically autonomous. We don't alter the trajectory while executing. [...]

And so the longest at the time of the post was about a week and that's pretty close to the longest. Of course the research project itself was only about three weeks so you know we probably can go longer.

Specifications and feedback loops

An interesting aspect of this project design is feedback loops. For agents to work autonomously for long periods of time they need as much useful context about the problem they are solving as possible, combined with effective feedback loops to help them make decisions.

The FastRender repo uses git submodules to include relevant specifications, including csswg-drafts, tc39-ecma262 for JavaScript, whatwg-dom, whatwg-html and more. 14:06

Feedback loops to the system are very important. Agents are working for very long periods continuously, and without guardrails and feedback to know whether what they're doing is right or wrong it can have a big impact over a long rollout. Specs are definitely an important part—you can see lots of comments in the code base that AI wrote referring specifically to specs that they found in the specs submodules.

GPT-5.2 is a vision-capable model, and part of the feedback loop for FastRender included taking screenshots of the rendering results and feeding those back into the model: 16:23

In the earlier evolution of this project, when it was just doing the static renderings of screenshots, this was definitely a very explicit thing we taught it to do. And these models are visual models, so they do have that ability. We have progress indicators to tell it to compare the diff against a golden sample.

The strictness of the Rust compiler helped provide a feedback loop as well: 15:52

The nice thing about Rust is you can get a lot of verification just from compilation, and that is not as available in other languages.

The agents chose the dependencies

We talked about the Cargo.toml dependencies that the project had accumulated, almost all of which had been selected by the agents themselves.

Some of these, like Skia for 2D graphics rendering or HarfBuzz for text shaping, were obvious choices. Others such as Taffy felt like they might go against the from-scratch goals of the project, since that library implements CSS flexbox and grid layout algorithms directly. This was not an intended outcome. 27:53

Similarly these are dependencies that the agent picked to use for small parts of the engine and perhaps should have actually implemented itself. I think this reflects on the importance of the instructions, because I actually never encoded specifically the level of dependencies we should be implementing ourselves.

The agents vendored in Taffy and applied a stream of changes to that vendored copy. 31:18

It's currently vendored. And as the agents work on it, they do make changes to it. This was actually an artifact from the very early days of the project before it was a fully fledged browser... it's implementing things like the flex and grid layers, but there are other layout methods like inline, block, and table, and in our new experiment, we're removing that completely.

The inclusion of QuickJS despite the presence of a home-grown ecma-rs implementation has a fun origin story: 35:15

I believe it mentioned that it pulled in the QuickJS because it knew that other agents were working on the JavaScript engine, and it needed to unblock itself quickly. [...]

It was like, eventually, once that's finished, let's remove it and replace with the proper engine.

I love how similar this is to the dynamics of a large-scale human engineering team, where you could absolutely see one engineer getting frustrated at another team not having delivered yet and unblocking themselves by pulling in a third-party library.

Intermittent errors are OK, actually

Here's something I found really surprising: the agents were allowed to introduce small errors into the codebase as they worked! 39:42

One of the trade-offs was: if you wanted every single commit to be a hundred percent perfect, make sure it can always compile every time, that might be a synchronization bottleneck. [...]

Especially as you break up the system into more modularized aspects, you can see that errors get introduced, but small errors, right? An API change or some syntax error, but then they get fixed really quickly after a few commits. So there's a little bit of slack in the system to allow these temporary errors so that the overall system can continue to make progress at a really high throughput. [...]

People may say, well, that's not correct code. But it's not that the errors are accumulating. It's a stable rate of errors. [...] That seems like a worthwhile trade-off.

If you're going to have thousands of agents working in parallel optimizing for throughput over correctness turns out to be a strategy worth exploring.

A single engineer plus a swarm of agents in January 2026

The thing I find most interesting about FastRender is how it demonstrates the extreme edge of what a single engineer can achieve in early 2026 with the assistance of a swarm of agents.

FastRender may not be a production-ready browser, but it represents over a million lines of Rust code, written in a few weeks, that can already render real web pages to a usable degree.

A browser really is the ideal research project to experiment with this new, weirdly shaped form of software engineering.

I asked Wilson how much mental effort he had invested in browser rendering compared to agent co-ordination. 11:34

The browser and this project were co-developed and very symbiotic, only because the browser was a very useful objective for us to measure and iterate the progress of the harness. The goal was to iterate on and research the multi-agent harness—the browser was just the research example or objective.

FastRender is effectively using a full browser rendering engine as a "hello world" exercise for multi-agent coordination!

Tags: browsers, youtube, ai, rust, generative-ai, llms, ai-assisted-programming, coding-agents, cursor, parallel-agents, browser-challenge

Under the hood of Canada Spends with Brendan Samek

2025-12-09T23:52:05+00:00

I talked to Brendan Samek about Canada Spends, a project from Build Canada that makes Canadian government financial data accessible and explorable using a combination of Datasette, a neat custom frontend, Ruby ingestion scripts, sqlite-utils and pieces of LLM-powered PDF extraction.

Here's the video on YouTube.

Sections within that video:

02:57 Data sources and the PDF problem
05:51 Crowdsourcing financial data across Canada
07:27 Datasette demo: Search and facets
12:33 Behind the scenes: Ingestion code
17:24 Data quality horror stories
20:46 Using Gemini to extract PDF data
25:24 Why SQLite is perfect for data distribution

Build Canada and Canada Spends

Build Canada is a volunteer-driven non-profit that launched in February 2025 - here's some background information on the organization, which has a strong pro-entrepreneurship and pro-technology angle.

Canada Spends is their project to make Canadian government financial data more accessible and explorable. It includes a tax sources and sinks visualizer and a searchable database of government contracts, plus a collection of tools covering financial data from different levels of government.

Datasette for data exploration

The project maintains a Datasette instance at api.canadasbilding.com containing the data they have gathered and processed from multiple data sources - currently more than 2 million rows plus a combined search index across a denormalized copy of that data.

Processing PDFs

The highest quality government financial data comes from the audited financial statements that every Canadian government department is required to publish. As is so often the case with government data, these are usually published as PDFs.

Brendan has been using Gemini to help extract data from those PDFs. Since this is accounting data the numbers can be summed and cross-checked to help validate the LLM didn't make any obvious mistakes.

Six short video demos of LLM and Datasette projects

2025-01-22T02:09:54+00:00

Last Friday Alex Garcia and I hosted a new kind of Datasette Public Office Hours session, inviting members of the Datasette community to share short demos of projects that they had built. The session lasted just over an hour and featured demos from six different people.

We broadcast live on YouTube, but I've now edited the session into separate videos. These are listed below, along with project summaries and show notes for each presentation.

You can also watch all six videos in this YouTube playlist.

llm-logs-feedback by Matthias Lübken

llm-logs-feedback is a plugin by Matthias Lübken for LLM which adds the ability to store feedback on prompt responses, using new llm feedback+1 and llm feedback-1 commands. These also accept an optional comment, and the feedback is stored in a feedback table in SQLite.

You can install the plugin from PyPI like this:

llm install llm-logs-feedback

The full plugin implementation is in the llm_logs_feedback.py file in Matthias' GitHub repository.

llm-model-gateway and llm-consortium by Thomas Hughes

Tommy Hughes has been developing a whole array of LLM plugins, including his llm-plugin-generator which is a plugin that can help write new plugins!

He started by demonstrating llm-model-gateway, a plugin that adds a llm serve command which starts a localhost server running an imitation of the OpenAI API against LLM models:

llm install llm-model-gateway
llm serve

Tommy's main demo was of llm-consortium, a plugin which can use a consortium of collaborating LLM models working together to solve problems.

llm install llm-consortium
llm consortium "Best way to prepare avocado toast" \
  --models claude-3.5-sonnet \
  --models gpt-4o \
  --arbiter gegemini-1.5-pro-latest \
  --confidence-threshold 0.8 \
  --max-iterations 3 \
  --output results.json

I ran this and got back:

The best way to prepare avocado toast involves selecting quality ingredients and careful preparation. Start with a thick slice of good quality bread, such as sourdough or whole grain. Toast it to your preferred level of crispness. While the bread is toasting, prepare a ripe avocado. Cut it in half, remove the pit, and scoop the flesh into a bowl. Mash the avocado with a fork, leaving some chunks for texture. Season with salt, black pepper, and a squeeze of fresh lemon juice to prevent browning. Optional additions include a touch of red pepper flakes.

Once the toast is ready, let it cool slightly before spreading the seasoned avocado evenly over it. Consider lightly rubbing a cut garlic clove on the warm toast for an extra layer of flavor (optional).

Enhance your avocado toast with your favorite toppings. Popular choices include: everything bagel seasoning, sliced tomatoes, radishes, a poached or fried egg (for added protein), microgreens, smoked salmon (for a more savory option), feta cheese crumbles, or a drizzle of hot sauce. For a finishing touch, drizzle with high-quality olive oil and sprinkle with sesame or chia seeds for added texture.

Consider dietary needs when choosing toppings. For example, those following a low-carb diet might skip the tomatoes and opt for more protein and healthy fats.

Finally, pay attention to presentation. Arrange the toppings neatly for a visually appealing toast. Serve immediately to enjoy the fresh flavors and crispy toast.

But the really interesting thing is the full log of the prompts and responses sent to Claude 3.5 Sonnet and GPT-4o, followed by a combined prompt to Gemini 1.5 Pro to have it arbitrate between the two responses. You can see the full logged prompts and responses here. Here's that results.json output file.

Congressional Travel Explorer with Derek Willis

Derek Willis teaches data journalism at the Philip Merrill College of Journalism at the University of Maryland. For a recent project his students built a Congressional Travel Explorer interactive using Datasette, AWS Extract and Claude 3.5 Sonnet to analyze travel disclosures from members of Congress.

One of the outcomes from the project was this story in Politico: Members of Congress have taken hundreds of AIPAC-funded trips to Israel in the past decade.

llm-questioncache with Nat Knight

llm-questioncache builds on top of https://llm.datasette.io/ to cache answers to questions, using embeddings to return similar answers if they have already been stored.

Using embeddings for de-duplication of similar questions is an interesting way to apply LLM's embeddings feature.

Improvements to Datasette Enrichments with Simon Willison

I've demonstrated improvements I've been making to Datasette's Enrichments system over the past few weeks.

Enrichments allow you to apply an operation - such as geocoding, a QuickJS JavaScript transformation or an LLM prompt - against selected rows within a table.

The latest release of datasette-enrichments adds visible progress bars and the ability to pause, resume and cancel an enrichment job that is running against a table.

Datasette comments, pins and write UI with Alex Garcia

We finished with three plugin demos from Alex, showcasing collaborative features we have been developing for Datasette Cloud:

datasette-write-ui provides tools for editing and adding data to Datasette tables. A new feature here is the ability to shift-click a row to open the editing interface for that row.
datasette-pins allows users to pin tables and databases to their Datasette home page, making them easier to find.
datasette-comments adds a commenting interface to Datasette, allowing users to leave comments on individual rows in a table.

Tags: community, derek-willis, demos, youtube, data-journalism, enrichments, llm, datasette, llms, alex-garcia, datasette-public-office-hours, ai, generative-ai

Project: Civic Band - scraping and searching PDF meeting minutes from hundreds of municipalities

2024-11-16T22:14:01+00:00

I interviewed Philip James about Civic Band, his "slowly growing collection of databases of the minutes from civic governments". Philip demonstrated the site and talked through his pipeline for scraping and indexing meeting minutes from many different local government authorities around the USA.

We recorded this conversation as part of yesterday's Datasette Public Office Hours session.

Civic Band

Philip was inspired to start thinking more about local government after the 2016 US election. He realised that there was a huge amount of information about decisions made by local authorities tucked away in their meeting minutes,but that information was hidden away in thousands of PDF files across many different websites.

There was this massive backlog of basically every decision that had ever been made by one of these bodies. But it was almost impossible to discover because it lives in these systems where the method of exchange is a PDF.

Philip lives in Alameda, which makes its minutes available via this portal powered by Legistar. It turns out there are a small number of vendors that provide this kind of software tool, so once you've written a scraper for one it's likely to work for many others as well.

Here's the Civic Band portal for Alameda, powered by Datasette.

It's running the datasette-search-all plugin and has both tables configured for full-text search. Here's a search for housing:

The technical stack

The public Civic Band sites all run using Datasette in Docker Containers - one container per municipality. They're hosted on a single Hetzner machine.

The ingestion pipeline runs separately from the main hosting environment, using a Mac Mini on Philp's desk at home.

OCR works by breaking each PDF up into images and then running Tesseract OCR against them directly on the Mac Mini. This processes in the order of 10,000 or less new pages of documents a day.

Philip treats PDF as a normalization target, because the pipeline is designed around documents with pages of text. In the rare event that a municipality publishes documents in another format such as .docx he converts them to PDF before processing.

PNG images of the PDF pages are served via a CDN, and the OCRd text is written to SQLite database files - one per municipality. SQLite FTS provides full-text search.

Scale and storage

The entire project currently comes to about 265GB on disk. The PNGs of the pages use about 350GB of CDN storage.

Most of the individual SQLite databases are very small. The largest is for Maui County which is around 535MB because that county has professional stenographers taking detailed notes for every one of their meetings.

Each city adds only a few documents a week so growth is manageable even as the number of cities grows.

Future plans

We talked quite a bit about a goal to allow users to subscribe to updates that match specific search terms.

Philip has been building out a separate site called Civic Observer to address this need, which will store searches and then execute the periodically using the Datasette JSON API, with a Django app to record state to avoid sending the same alert more than once.

I've had a long term ambition to build some kind of saved search alerts plugin for Datasette generally, to allow users to subscribe to new results for arbitrary SQL queries. My sqlite-chronicle library is part or that effort - it uses SQLite triggers to maintain version numbers for individual rows in a table, allowing you to query just the rows that have been inserted or modified since the version number last time you ran the query.

Philip is keen to talk to anyone who is interested in using Civic Band or helping expand it to even more cities. You can find him on the Datasette Discord.

Tags: data-journalism, political-hacking, politics, sqlite, datasette, datasette-public-office-hours

Project: VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5

2024-11-07T18:41:51+00:00

I'm starting a new interview series called Project. The idea is to interview people who are building interesting data projects and talk about what they've built, how they built it, and what they learned along the way.

The first episode is a conversation with Rajiv Sinclair from Public Data Works about VERDAD, a brand new project in collaboration with journalist Martina Guzmán that aims to track misinformation in radio broadcasts around the USA.

VERDAD hits a whole bunch of my interests at once. It's a beautiful example of scrappy data journalism in action, and it attempts something that simply would not have been possible just a year ago by taking advantage of new LLM tools.

You can watch the half hour interview on YouTube. Read on for the shownotes and some highlights from our conversation.

The VERDAD project

VERDAD tracks radio broadcasts from 48 different talk radio radio stations across the USA, primarily in Spanish. Audio from these stations is archived as MP3s, transcribed and then analyzed to identify potential examples of political misinformation.

The result is "snippets" of audio accompanied by the trancript, an English translation, categories indicating the type of misinformation that may be present and an LLM-generated explanation of why that snippet was selected.

These are then presented in an interface for human reviewers, who can listen directly to the audio in question, update the categories and add their own comments as well.

VERDAD processes around a thousand hours of audio content a day - way more than any team of journalists or researchers could attempt to listen to manually.

The technology stack

VERDAD uses Prefect as a workflow orchestration system to run the different parts of their pipeline.

There are multiple stages, roughly as follows:

MP3 audio is recorded from radio station websites and stored in Cloudflare R2
An initial transcription is performed using the extremely inexpensive Gemini 1.5 Flash
That transcript is fed to the more powerful Gemini 1.5 Pro with a complex prompt to help identify potential misinformation snippets
Once identified, audio containing snippets is run through the more expensive Whisper model to generate timestamps for the snippets
Further prompts then generate things like English translations and summaries of the snippets

Developing the prompts

The prompts used by VERDAD are available in their GitHub repository and they are fascinating.

Rajiv initially tried to get Gemini 1.5 Flash to do both the transcription and the misinformation detection, but found that asking that model to do two things at once frequently confused it.

Instead, he switched to a separate prompt running that transcript against Gemini 1.5 Pro. Here's that more complex prompt - it's 50KB is size and includes a whole bunch of interesting sections, including plenty of examples and a detailed JSON schema.

Here's just one of the sections aimed at identifying content about climate change:

4. Climate Change and Environmental Policies

Description:

Disinformation that denies or minimizes human impact on climate change, often to oppose environmental regulations. It may discredit scientific consensus and promote fossil fuel interests.

Common Narratives:

Labeling climate change as a "hoax".

Arguing that climate variations are natural cycles.

Claiming environmental policies harm the economy.

Cultural/Regional Variations:

Spanish-Speaking Communities:

Impact of climate policies on agricultural jobs.

Arabic-Speaking Communities:

Reliance on oil economies influencing perceptions.

Potential Legitimate Discussions:

Debates on balancing environmental protection with economic growth.

Discussions about energy independence.

Examples:

Spanish: "El 'cambio climático' es una mentira para controlarnos."

Arabic: "'تغير المناخ' كذبة للسيطرة علينا."

Rajiv iterated on these prompts over multiple months - they are the core of the VERDAD project. Here's an update from yesterday informing the model of the US presidental election results so that it wouldn't flag claims of a candidate winning as false!

Rajiv used both Claude 3.5 Sonnet and OpenAI o1-preview to help develop the prompt itself. Here's his transcript of a conversation with Claude used to iterate further on an existing prompt.

The human review process

The final component of VERDAD is the web application itself. Everyone knows that AI makes mistakes, a lot. Providing as much context as possible for human review is essential.

The Whisper transcripts provide accurate timestamps (Gemini is sadly unable to provide those on its own), which means the tool can provide the Spanish transcript, the English translation and a play button to listen to the audio at the moment of the captured snippet.

Want to learn more?

VERDAD is under active development right now. Rajiv and his team are keen to collaborate, and are actively looking forward to conversations with other people working in this space. You can reach him at help@verdad.app.

The technology stack itself is incredibly promising. Pulling together a project like this even a year ago would have been prohibitively expensive, but new multi-modal LLM tools like Gemini (and Gemini 1.5 Flash in particular) are opening up all sorts of new possibilities.

Tags: data-journalism, youtube, ai, prompt-engineering, generative-ai, llms, gemini, digital-literacy