Simon Willison's Weblog: steve-krouse

Quoting Steve Krouse

2025-11-12T17:21:19+00:00

The fact that MCP is a difference surface from your normal API allows you to ship MUCH faster to MCP. This has been unlocked by inference at runtime

Normal APIs are promises to developers, because developer commit code that relies on those APIs, and then walk away. If you break the API, you break the promise, and you break that code. This means a developer gets woken up at 2am to fix the code

But MCP servers are called by LLMs which dynamically read the spec every time, which allow us to constantly change the MCP server. It doesn't matter! We haven't made any promises. The LLM can figure it out afresh every time

— Steve Krouse

Tags: apis, ai, generative-ai, llms, steve-krouse, model-context-protocol

Quoting Steve Krouse

2025-07-30T21:21:16+00:00

When you vibe code, you are incurring tech debt as fast as the LLM can spit it out. Which is why vibe coding is perfect for prototypes and throwaway projects: It's only legacy code if you have to maintain it! [...]

The worst possible situation is to have a non-programmer vibe code a large project that they intend to maintain. This would be the equivalent of giving a credit card to a child without first explaining the concept of debt. [...]

If you don't understand the code, your only recourse is to ask AI to fix it for you, which is like paying off credit card debt with another credit card.

— Steve Krouse, Vibe code is legacy code

Tags: ai, generative-ai, llms, ai-assisted-programming, technical-debt, steve-krouse, vibe-coding

Quoting Steve Krouse

2025-05-31T14:26:08+00:00

There's a new kind of coding I call "hype coding" where you fully give into the hype, and what's coming right around the corner, that you lose sight of whats' possible today. Everything is changing so fast that nobody has time to learn any tool, but we should aim to use as many as possible. Any limitation in the technology can be chalked up to a 'skill issue' or that it'll be solved in the next AI release next week. Thinking is dead. Turn off your brain and let the computer think for you. Scroll on tiktok while the armies of agents code for you. If it isn't right, tell it to try again. Don't read. Feed outputs back in until it works. If you can't get it to work, wait for the next model or tool release. Maybe you didn't use enough MCP servers? Don't forget to add to the hype cycle by aggrandizing all your successes. Don't read this whole tweet, because it's too long. Get an AI to summarize it for you. Then call it "cope". Most importantly, immediately mischaracterize "hype coding" to mean something different than this definition. Oh the irony! The people who don't care about details don't read the details about not reading the details

— Steve Krouse

Tags: ai, steve-krouse, vibe-coding, model-context-protocol, semantic-diffusion

What we learned copying all the best code assistants

2025-01-04T20:49:29+00:00

What we learned copying all the best code assistants

Steve Krouse describes Val Town's experience so far building features that use LLMs, starting with completions (powered by Codeium and Val Town's own codemirror-codeium extension) and then rolling through several versions of their Townie code assistant, initially powered by GPT 3.5 but later upgraded to Claude 3.5 Sonnet.

This is a really interesting space to explore right now because there is so much activity in it from larger players. Steve classifies Val Town's approach as "fast following" - trying to spot the patterns that are proven to work and bring them into their own product.

It's challenging from a strategic point of view because Val Town's core differentiator isn't meant to be AI coding assistance: they're trying to build the best possible ecosystem for hosting and iterating lightweight server-side JavaScript applications. Isn't this stuff all a distraction from that larger goal?

Steve concludes:

However, it still feels like there’s a lot to be gained with a fully-integrated web AI code editor experience in Val Town – even if we can only get 80% of the features that the big dogs have, and a couple months later. It doesn’t take that much work to copy the best features we see in other tools. The benefits to a fully integrated experience seems well worth that cost. In short, we’ve had a lot of success fast-following so far, and think it’s worth continuing to do so.

It continues to be wild to me how features like this are easy enough to build now that they can be part-time side features at a small startup, and not the entire project.

Via Hacker News

Tags: ai, prompt-engineering, generative-ai, llms, ai-assisted-programming, val-town, steve-krouse

Cerebras Coder

2024-10-31T22:39:15+00:00

Cerebras Coder

Val Town founder Steve Krouse has been building demos on top of the Cerebras API that runs Llama3.1-70b at 2,000 tokens/second.

Having a capable LLM with that kind of performance turns out to be really interesting. Cerebras Coder is a demo that implements Claude Artifact-style on-demand JavaScript apps, and having it run at that speed means changes you request are visible within less than a second:

Steve's implementation (created with the help of Townie, the Val Town code assistant) demonstrates the simplest possible version of an iframe sandbox:

<iframe
    srcDoc={code}
    sandbox="allow-scripts allow-modals allow-forms allow-popups allow-same-origin allow-top-navigation allow-downloads allow-presentation allow-pointer-lock"
/>

Where code is populated by a setCode(...) call inside a React component.

The most interesting applications of LLMs continue to be where they operate in a tight loop with a human - this can make those review loops potentially much faster and more productive.

Via @stevekrouse

Tags: iframes, sandboxing, ai, react, generative-ai, llama, llms, ai-assisted-programming, val-town, steve-krouse, cerebras, llm-performance

Building search-based RAG using Claude, Datasette and Val Town

2024-06-21T20:44:24+00:00

Retrieval Augmented Generation (RAG) is a technique for adding extra "knowledge" to systems built on LLMs, allowing them to answer questions against custom information not included in their training data. A common way to implement this is to take a question from a user, translate that into a set of search queries, run those against a search engine and then feed the results back into the LLM to generate an answer.

I built a basic version of this pattern against the brand new Claude 3.5 Sonnet language model, using SQLite full-text search running in Datasette as the search backend and Val Town as the prototyping platform.

The implementation took just over an hour, during a live coding session with Val.Town founder Steve Krouse. I was the latest guest on Steve's live streaming series where he invites people to hack on projects with his help.

You can watch the video below or on YouTube. Here are my own detailed notes to accompany the session.

Bonus: Claude 3.5 Sonnet artifacts demo

We started the stream by chatting a bit about the new Claude 3.5 Sonnet release. This turned into an unplanned demo of their "artifacts" feature where Claude can now build you an interactive web page on-demand.

At 3m02s I prompted it with:

Build me a web app that teaches me about mandelbrot fractals, with interactive widgets

This worked! Here's the code it wrote - I haven't yet found a good path for turning that into a self-hosted interactive page yet.

This didn't support panning, so I added:

Again but let me drag on the canvas element to pan around

Which gave me this. Pretty impressive!

Ingredients for the RAG project

RAG is often implemented using vector search against embeddings, but there's an alternative approach where you turn the user's question into some full-text search queries, run those against a traditional search engine, then feed the results back into an LLM and ask it to use them to answer the question.

SQLite includes surprisingly good full-text search, and I've built a lot of tools against that in the past - including sqlite-utils enable-fts and Datasette's FTS features.

My blog has a lot of content, which lives in a Django PostgreSQL database. But I also have a GitHub Actions repository which backs up that data as JSON, and then publishes a SQLite copy of it to datasette.simonwillison.net - which means I have a Datasette-powered JSON API for running searches against my content.

Let's use that API to build a question answering RAG system!

Step one then was to prototype up a SQL query we could use with that API to get back search results. After some iteration I got to this:

select
  blog_entry.id,
  blog_entry.title,
  blog_entry.body,
  blog_entry.created
from
  blog_entry
  join blog_entry_fts on blog_entry_fts.rowid = blog_entry.rowid
where
  blog_entry_fts match :search
order by
  rank
limit
  10

Try that here. The query works by joining the blog_entry table to the SQLite FTS blog_entry_fts virtual table, matched against the ?search= parameter from the URL.

When you join against a FTS table like this a rank column is exposed with the relevance score for each match.

Adding .json to the above URL turns it into an API call... so now we have a search API we can call from other code.

A plan for the build

We spent the rest of the session writing code in Val Town, which offers a browser editor for a server-side Deno-based environment for executing JavaScript (and TypeScript) code.

The finished code does the following:

Accepts a user's question from the ?question= query string.
Asks Claude 3.5 Sonnet to turn that question into multiple single-word search queries, using a Claude function call to enforce a schema of a JSON list of strings.
Turns that list of keywords into a SQLite FTS query that looks like this: "shot-scraper" OR "screenshot" OR "web" OR "tool" OR "automation" OR "CLI"
Runs that query against Datasette to get back the top 10 results.
Combines the title and body from each of those results into a longer context.
Calls Claude 3 again (originally Haiku, but then we upgraded to 3.5 Sonnet towards the end) with that context and ask it to answer the question.
Return the results to the user.

The annotated final script

Here's the final script we ended up with, with inline commentary. Here's the initial setup:

import Anthropic from "npm:@anthropic-ai/sdk@0.24.0";

/* This automatically picks up the API key from the ANTHROPIC_API_KEY
environment variable, which we configured in the Val Town settings */
const anthropic = new Anthropic();

We're using the very latest release of the Anthropic TypeScript SDK, which came out just a few hours prior to recording the livestream.

I set the ANTHROPIC_API_KEY environment variable to my Claude 3 API key in the Val Town settings, making it available to all of my Vals. The Anthropic() constructor picks that up automatically.

Next, the function to suggest keywords for a user's question:

async function suggestKeywords(question) {
  // Takes a question like "What is shot-scraper?" and asks 3.5 Sonnet
  // to suggest individual search keywords to help answer the question.
  const message = await anthropic.messages.create({
    max_tokens: 128,
    model: "claude-3-5-sonnet-20240620",
    // The tools option enforces a JSON schema array of strings
    tools: [{
      name: "suggested_search_keywords",
      description: "Suggest individual search keywords to help answer the question.",
      input_schema: {
        type: "object",
        properties: {
          keywords: {
            type: "array",
            items: {
              type: "string",
            },
            description: "List of suggested single word search keywords",
          },
        },
        required: ["keywords"],
      },
    }],
    // This forces it to always run the suggested_search_keywords tool
    tool_choice: { type: "tool", name: "suggested_search_keywords" },
    messages: [
      { role: "user", content: question },
    ],
  });
  // This helped TypeScript complain less about accessing .input.keywords
  // since it knows this object can be one of two different types
  if (message.content[0].type == "text") {
    throw new Error(message.content[0].text);
  }
  return message.content[0].input.keywords;
}

We're asking Claude 3.5 Sonnet here to suggest individual search keywords to help answer that question. I tried Claude 3 Haiku first but it didn't reliably return single word keywords - Sonnet 3.5 followed the "single word search keywords" instruction better.

This function also uses Claude tools to enforce a response in a JSON schema that specifies an array of strings. More on how I wrote that code (with Claude's assistance) later on.

Next, the code to run the search itself against Datasette:

// The SQL query from earlier
const sql = `select
  blog_entry.id,
  blog_entry.title,
  blog_entry.body,
  blog_entry.created
from
  blog_entry
  join blog_entry_fts on blog_entry_fts.rowid = blog_entry.rowid
where
  blog_entry_fts match :search
order by
  rank
limit
  10`;

async function runSearch(keywords) {
  // Turn the keywords into "word1" OR "word2" OR "word3"
  const search = keywords.map(s => `"${s}"`).join(" OR ");
  // Compose the JSON API URL to run the query
  const params = new URLSearchParams({
    search,
    sql,
    _shape: "array",
  });
  const url = "https://datasette.simonwillison.net/simonwillisonblog.json?" + params;
  const result = await (await fetch(url)).json();
  return result;
}

Datasette supports read-only SQL queries via its JSON API, which means we can construct the SQL query as a JavaScript string and then encode it as a query string using URLSearchParams().

We also take the list of keywords and turn them into a SQLite FTS search query that looks like "word1" OR "word2" OR "word3".

SQLite's built-in relevance calculations work well with this - you can throw in dozens of words separated by OR and the top ranking results will generally be the ones with the most matches.

Finally, the code that ties this together - suggests keywords, runs the search and then asks Claude to answer the question. I ended up bundling that together in the HTTP handler for the Val Town script - this is the code that is called for every incoming HTTP request:

export default async function(req: Request) {
  // This is the Val Town HTTP handler
  const url = new URL(req.url);
  const question = url.searchParams.get("question").slice(0, 40);
  if (!question) {
    return Response.json({ "error": "No question provided" });
  }
  // Turn the question into search terms
  const keywords = await suggestKeywords(question);

  // Run the actual search
  const result = await runSearch(keywords);

  // Strip HTML tags from each body property, modify in-place:
  result.forEach(r => {
    r.body = r.body.replace(/<[^>]*>/g, "");
  });

  // Glue together a string of the title and body properties in one go
  const context = result.map(r => r.title + " " + r.body).join("\n\n");

  // Ask Claude to answer the question
  const message = await anthropic.messages.create({
    max_tokens: 1024,
    model: "claude-3-haiku-20240307",
    messages: [
      { role: "user", content: context },
      { role: "assistant", content: "Thank you for the context, I am ready to answer your question" },
      { role: "user", content: question },
    ],
  });
  return Response.json({answer: message.content[0].text});
}

There are many other ways you could arrange the prompting here. I quite enjoy throwing together a fake conversation like this that feeds in the context and then hints at the agent that it should respond next with its answer, but there are many potential variations on this theme.

This initial version returned the answer as a JSON object, something like this:

{
    "answer": "shot-scraper is a command-line tool that automates the process of taking screenshots of web pages..."
}

We were running out of time, but we wanted to add an HTML interface. Steve suggested getting Claude to write the whole thing! So we tried this:

  const message = await anthropic.messages.create({
    max_tokens: 1024,
    model: "claude-3-5-sonnet-20240620", // "claude-3-haiku-20240307",
    system: "Return a full HTML document as your answer, no markdown, make it pretty with exciting relevant CSS",
    messages: [
      { role: "user", content: context },
      { role: "assistant", content: "Thank you for the context, I am ready to answer your question as HTML" },
      { role: "user", content: question },
    ],
  });
  // Return back whatever HTML Claude gave us
  return new Response(message.content[0].text, {
    status: 200,
    headers: { "Content-Type": "text/html" }
  });

We upgraded to 3.5 Sonnet to see if it had better "taste" than Haiku, and the results were really impressive. Here's what it gave us for "What is Datasette?":

It even styled the page with flexbox to arrange the key features of Datasette in a 2x2 grid! You can see that in the video at 1h13m17s.

There's a full copy of the final TypeScript code available in a Gist.

Some tricks we used along the way

I didn't write all of the above code. Some bits of it were written by pasting things into Claude 3.5 Sonnet, and others used the Codeium integration in the Val Town editor (described here).

One pattern that worked particularly well was getting Sonnet to write the tool-using TypeScript code for us.

The Claude 3 documentation showed how to do that using curl. I pasted that curl example in, added some example TypeScript and then prompted:

Guess the JavaScript for setting up a tool which just returns a list of strings, called suggested_search_keywords

Here's my full prompt:

#!/bin/bash
IMAGE_URL="https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
IMAGE_MEDIA_TYPE="image/jpeg"
IMAGE_BASE64=$(curl "$IMAGE_URL" | base64)
curl https://api.anthropic.com/v1/messages \
     --header "content-type: application/json" \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --data \
'{
    "model": "claude-3-sonnet-20240229",
    "max_tokens": 1024,
    "tools": [{
        "name": "record_summary",
        "description": "Record summary of an image using well-structured JSON.",
        "input_schema": {
            "type": "object",
            "properties": {
                "key_colors": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "r": { "type": "number", "description": "red value [0.0, 1.0]" },
                            "g": { "type": "number", "description": "green value [0.0, 1.0]" },
                            "b": { "type": "number", "description": "blue value [0.0, 1.0]" },
                            "name": { "type": "string", "description": "Human-readable color name in snake_case, e.g. \"olive_green\" or \"turquoise\"" }
                        },
                        "required": [ "r", "g", "b", "name" ]
                    },
                    "description": "Key colors in the image. Limit to less then four."
                },
                "description": {
                    "type": "string",
                    "description": "Image description. One to two sentences max."
                },
                "estimated_year": {
                    "type": "integer",
                    "description": "Estimated year that the images was taken, if is it a photo. Only set this if the image appears to be non-fictional. Rough estimates are okay!"
                }
            },
            "required": [ "key_colors", "description" ]
        }
    }],
    "tool_choice": {"type": "tool", "name": "record_summary"},
    "messages": [
        {"role": "user", "content": [
            {"type": "image", "source": {
                "type": "base64",
                "media_type": "'$IMAGE_MEDIA_TYPE'",
                "data": "'$IMAGE_BASE64'"
            }},
            {"type": "text", "text": "Describe this image."}
        ]}
    ]
}'

Based on that example and this JavaScript code:

const anthropic = new Anthropic();
const message = await anthropic.messages.create({
  max_tokens: 1024,
  system: "Suggest individual search keywords to help answer this question. No yapping.",
  messages: [
    { role: "user", content: question },
  ],
  model: "claude-3-haiku-20240307",
});
console.log(message.content[0].text);

Guess the JavaScript for setting up a tool which just returns a list of strings, called suggested_search_keywords

It gave me back this, which was the exact code I needed to make my tool-enabled API call from Val Town:

const anthropic = new Anthropic();
const message = await anthropic.messages.create({
  max_tokens: 1024,
  model: "claude-3-haiku-20240307",
  tools: [{
    name: "suggested_search_keywords",
    description: "Suggest individual search keywords to help answer the question.",
    input_schema: {
      type: "object",
      properties: {
        keywords: {
          type: "array",
          items: {
            type: "string"
          },
          description: "List of suggested search keywords"
        }
      },
      required: ["keywords"]
    }
  }],
  tool_choice: { type: "tool", name: "suggested_search_keywords" },
  messages: [
    { role: "user", content: question }
  ]
});

console.log(message.content[0].text);

I really like this trick, and I use it often in my own work. Find some example code that illustrates the pattern you need, paste in some additional context and get the LLM to figure the rest out for you.

This is just a prototype

It's important to reiterate that this is just a prototype - it's the version of search-backed RAG I could get working in an hour.

Putting something like this into production requires a whole lot more work. Most importantly, good RAG systems are backed by evals - it's extremely hard to iterate on and improve a system like this if you don't have a good mechanism in place to evaluate if your changes are making things better or not. Your AI Product Needs Evals by Hamel Husain remains my favourite piece of writing on how to go about putting these together.

Additional links from the livestream

Here are some of the other projects and links mentioned during our conversation:

Datasette and its 150+ plugins.
My original idea for a project was to use the Datasette Write API and run scheduled Vals to import data from various sources (my toots, tweets, posts etc) into a single searchable table.
LLM - my command-line utility for working with different language models.
shot-scraper for automating screenshots and scraping websites with JavaScript from the command-line - here's a recent demo where I scraped Google using shot-scraper and fed the results into LLM as a basic form of RAG.
My current list of 277 projects with at least one release on GitHub.
My TIL blog, which runs on a templated version of Datasette - here's the database and here's the GitHub Actions workflow that builds it using the Baked Data pattern.
I have some previous experiments using embeddings with Datasette, including a table of embeddings (encoded like this) for my TIL blog which I use to power related items. That's described in this TIL: Storing and serving related documents with openai-to-sqlite and embeddings.

Tags: claude-3-5-sonnet, my-talks, rag, projects, datasette, claude, anthropic, llms, val-town, steve-krouse, prompt-to-app, claude-artifacts, annotated-talks, prompt-engineering, ai, generative-ai, ai-assisted-search, ai-assisted-programming

Val Town Newsletter 15

2024-02-15T16:26:09+00:00

Val Town Newsletter 15

I really like how Val Town founder Steve Krouse now accompanies their “what’s new” newsletter with a video tour of the new features. I’m seriously considering imitating this for my own projects.

Tags: javascript, video, val-town, steve-krouse