<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: tobias-lutke</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/tobias-lutke.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-03-13T03:44:34+00:00</updated><author><name>Simon Willison</name></author><entry><title>Shopify/liquid: Performance: 53% faster parse+render, 61% fewer allocations</title><link href="https://simonwillison.net/2026/Mar/13/liquid/#atom-tag" rel="alternate"/><published>2026-03-13T03:44:34+00:00</published><updated>2026-03-13T03:44:34+00:00</updated><id>https://simonwillison.net/2026/Mar/13/liquid/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/Shopify/liquid/pull/2056"&gt;Shopify/liquid: Performance: 53% faster parse+render, 61% fewer allocations&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
PR from Shopify CEO Tobias Lütke against Liquid, Shopify's open source Ruby template engine that was somewhat inspired by Django when Tobi first created it &lt;a href="https://simonwillison.net/2005/Nov/6/liquid/"&gt;back in 2005&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Tobi found dozens of new performance micro-optimizations using a variant of &lt;a href="https://github.com/karpathy/autoresearch"&gt;autoresearch&lt;/a&gt;, Andrej Karpathy's new system for having a coding agent run hundreds of semi-autonomous experiments to find new effective techniques for training &lt;a href="https://github.com/karpathy/nanochat"&gt;nanochat&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Tobi's implementation started two days ago with this &lt;a href="https://github.com/Shopify/liquid/blob/2543fdc1a101f555db208fb0deeb2e3bf1ae9e36/auto/autoresearch.md"&gt;autoresearch.md&lt;/a&gt; prompt file and an &lt;a href="https://github.com/Shopify/liquid/blob/2543fdc1a101f555db208fb0deeb2e3bf1ae9e36/auto/autoresearch.sh"&gt;autoresearch.sh&lt;/a&gt; script for the agent to run to execute the test suite and report on benchmark scores.&lt;/p&gt;
&lt;p&gt;The PR now lists &lt;a href="https://github.com/Shopify/liquid/pull/2056/commits"&gt;93 commits&lt;/a&gt; from around 120 automated experiments. The PR description lists what worked in detail - some examples:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Replaced StringScanner tokenizer with &lt;code&gt;String#byteindex&lt;/code&gt;.&lt;/strong&gt; Single-byte &lt;code&gt;byteindex&lt;/code&gt; searching is ~40% faster than regex-based &lt;code&gt;skip_until&lt;/code&gt;. This alone reduced parse time by ~12%.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pure-byte &lt;code&gt;parse_tag_token&lt;/code&gt;.&lt;/strong&gt; Eliminated the costly &lt;code&gt;StringScanner#string=&lt;/code&gt; reset that was called for every &lt;code&gt;{% %}&lt;/code&gt; token (878 times). Manual byte scanning for tag name + markup extraction is faster than resetting and re-scanning via StringScanner. [...]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cached small integer &lt;code&gt;to_s&lt;/code&gt;.&lt;/strong&gt; Pre-computed frozen strings for 0-999 avoid 267 &lt;code&gt;Integer#to_s&lt;/code&gt; allocations per render.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This all added up to a 53% improvement on benchmarks - truly impressive for a codebase that's been tweaked by hundreds of contributors over 20 years.&lt;/p&gt;
&lt;p&gt;I think this illustrates a number of interesting ideas:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Having a robust test suite - in this case 974 unit tests - is a &lt;em&gt;massive unlock&lt;/em&gt; for working with coding agents. This kind of research effort would not be possible without first having a tried and tested suite of tests.&lt;/li&gt;
&lt;li&gt;The autoresearch pattern - where an agent brainstorms a multitude of potential improvements and then experiments with them one at a time - is really effective.&lt;/li&gt;
&lt;li&gt;If you provide an agent with a benchmarking script "make it faster" becomes an actionable goal.&lt;/li&gt;
&lt;li&gt;CEOs can code again! Tobi has always been more hands-on than most, but this is a much more significant contribution than anyone would expect from the leader of a company with 7,500+ employees. I've seen this pattern play out a lot over the past few months: coding agents make it feasible for people in high-interruption roles to productively work with code again.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here's Tobi's &lt;a href="https://github.com/tobi"&gt;GitHub contribution graph&lt;/a&gt; for the past year, showing a significant uptick following that &lt;a href="https://simonwillison.net/tags/november-2025-inflection/"&gt;November 2025 inflection point&lt;/a&gt; when coding agents got really good.&lt;/p&gt;
&lt;p&gt;&lt;img alt="1,658 contributions in the last year - scattered lightly through Jun, Aug, Sep, Oct and Nov and then picking up significantly in Dec, Jan, and Feb." src="https://static.simonwillison.net/static/2026/tobi-contribs.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;He used &lt;a href="https://github.com/badlogic/pi-mono"&gt;Pi&lt;/a&gt; as the coding agent and released a new &lt;a href="https://github.com/davebcn87/pi-autoresearch"&gt;pi-autoresearch&lt;/a&gt; plugin in collaboration with David Cortés, which maintains state in an &lt;code&gt;autoresearch.jsonl&lt;/code&gt; file &lt;a href="https://github.com/Shopify/liquid/blob/3182b7c1b3758b0f5fe2d0fcc71a48bbcb11c946/autoresearch.jsonl"&gt;like this one&lt;/a&gt;.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://x.com/tobi/status/2032212531846971413"&gt;@tobi&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rails"&gt;rails&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ruby"&gt;ruby&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/andrej-karpathy"&gt;andrej-karpathy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/november-2025-inflection"&gt;november-2025-inflection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tobias-lutke"&gt;tobias-lutke&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/autoresearch"&gt;autoresearch&lt;/a&gt;&lt;/p&gt;



</summary><category term="django"/><category term="performance"/><category term="rails"/><category term="ruby"/><category term="ai"/><category term="andrej-karpathy"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="coding-agents"/><category term="agentic-engineering"/><category term="november-2025-inflection"/><category term="tobias-lutke"/><category term="autoresearch"/></entry><entry><title>Context engineering</title><link href="https://simonwillison.net/2025/Jun/27/context-engineering/#atom-tag" rel="alternate"/><published>2025-06-27T23:42:43+00:00</published><updated>2025-06-27T23:42:43+00:00</updated><id>https://simonwillison.net/2025/Jun/27/context-engineering/#atom-tag</id><summary type="html">
    &lt;p&gt;The term &lt;strong&gt;context engineering&lt;/strong&gt; has recently started to gain traction as a better alternative to prompt engineering. I like it. I think this one may have sticking power.&lt;/p&gt;
&lt;p&gt;Here's an example tweet &lt;a href="https://twitter.com/tobi/status/1935533422589399127"&gt;from Shopify CEO Tobi Lutke&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I really like the term “context engineering” over prompt engineering. &lt;/p&gt;
&lt;p&gt;It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Recently amplified &lt;a href="https://twitter.com/karpathy/status/1937902205765607626"&gt;by Andrej Karpathy&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;+1 for "context engineering" over "prompt engineering".&lt;/p&gt;
&lt;p&gt;People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step. Science because doing this right involves task descriptions and explanations, few shot examples, RAG, related (possibly multimodal) data, tools, state and history, compacting [...] Doing this well is highly non-trivial. And art because of the guiding intuition around LLM psychology of people spirits. [...]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've &lt;a href="https://simonwillison.net/2023/Feb/21/in-defense-of-prompt-engineering/"&gt;spoken favorably of prompt engineering&lt;/a&gt; in the past - I hoped that term could capture the inherent complexity of constructing reliable prompts. Unfortunately, most people's inferred definition is that it's a laughably pretentious term for typing things into a chatbot! &lt;/p&gt;
&lt;p&gt;It turns out that inferred definitions are the ones that stick. I think the inferred definition of "context engineering" is likely to be much closer to the intended meaning.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/andrej-karpathy"&gt;andrej-karpathy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/context-engineering"&gt;context-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tobias-lutke"&gt;tobias-lutke&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="ai"/><category term="andrej-karpathy"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="context-engineering"/><category term="tobias-lutke"/></entry><entry><title>Quoting Tobias Lütke</title><link href="https://simonwillison.net/2025/Apr/7/tobias/#atom-tag" rel="alternate"/><published>2025-04-07T18:32:20+00:00</published><updated>2025-04-07T18:32:20+00:00</updated><id>https://simonwillison.net/2025/Apr/7/tobias/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/tobi/status/1909231499448401946"&gt;&lt;p&gt;&lt;strong&gt;Using Al effectively is now a fundamental expectation of everyone at Shopify&lt;/strong&gt;. It's a tool of all trades today, and will only grow in importance. Frankly, I don't think it's feasible to opt out of learning the skill of applying Al in your craft; you are welcome to try, but I want to be honest I cannot see this working out today, and definitely not tomorrow. Stagnation is almost certain, and stagnation is slow-motion failure. If you're not climbing, you're sliding [...]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;We will add Al usage questions to our performance and peer review questionnaire&lt;/strong&gt;. Learning to use Al well is an unobvious skill. My sense is that a lot of people give up after writing a prompt and not getting the ideal thing back immediately. Learning to prompt and load context is important, and getting peers to provide feedback on how this is going will be valuable.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/tobi/status/1909231499448401946"&gt;Tobias Lütke&lt;/a&gt;, CEO of Shopify, self-leaked memo&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/careers"&gt;careers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tobias-lutke"&gt;tobias-lutke&lt;/a&gt;&lt;/p&gt;



</summary><category term="careers"/><category term="ai"/><category term="ai-ethics"/><category term="tobias-lutke"/></entry><entry><title>Could you train a ChatGPT-beating model for $85,000 and run it in a browser?</title><link href="https://simonwillison.net/2023/Mar/17/beat-chatgpt-in-a-browser/#atom-tag" rel="alternate"/><published>2023-03-17T15:43:38+00:00</published><updated>2023-03-17T15:43:38+00:00</updated><id>https://simonwillison.net/2023/Mar/17/beat-chatgpt-in-a-browser/#atom-tag</id><summary type="html">
    &lt;p&gt;I think it's now possible to train a large language model with similar functionality to GPT-3 for $85,000. And I think we might soon be able to run the resulting model entirely in the browser, and give it capabilities that leapfrog it ahead of ChatGPT.&lt;/p&gt;
&lt;p&gt;This is currently wild speculation on my part, but bear with me because I think this is worth exploring further.&lt;/p&gt;
&lt;p&gt;Large language models with GPT-3-like capabilities cost millions of dollars to build, thanks to the cost of running the expensive GPU servers needed to train them. Whether you are renting or buying those machines, there are still enormous energy costs to cover.&lt;/p&gt;
&lt;p&gt;Just one example of this: the &lt;a href="https://huggingface.co/bigscience/bloom-7b1"&gt;BLOOM large language model&lt;/a&gt; was trained in France with the support of the French government. The cost was estimated as $2-5M, it took almost four months to train and boasts about its low carbon footprint because most of the power came from a nuclear reactor!&lt;/p&gt;
&lt;p&gt;[ Fun fact: as of a few days ago you can now &lt;a href="https://github.com/NouamaneTazi/bloomz.cpp"&gt;run the openly licensed BLOOM on your own laptop&lt;/a&gt;, using Nouamane Tazi's adaptive copy of the &lt;code&gt;llama.cpp&lt;/code&gt; code that made that possible for LLaMA ]&lt;/p&gt;
&lt;p&gt;Recent developments have made me suspect that these costs could be made dramatically lower. I think a capable language model can now be trained from scratch for around $85,000.&lt;/p&gt;
&lt;h4&gt;It's all about that LLaMA&lt;/h4&gt;
&lt;p&gt;The LLaMA plus Alpaca combination is the key here.&lt;/p&gt;
&lt;p&gt;I wrote about these two projects previously:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2023/Mar/11/llama/"&gt;Large language models are having their Stable Diffusion moment&lt;/a&gt; discusses the significance of LLaMA&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2023/Mar/13/alpaca/"&gt;Stanford Alpaca, and the acceleration of on-device large language model development&lt;/a&gt; describes Alpaca&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To recap: &lt;a href="https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/"&gt;LLaMA&lt;/a&gt; by Meta research provided a GPT-3 class model trained entirely on documented, available public training information, as opposed to OpenAI's continuing practice of not revealing the sources of their training data.&lt;/p&gt;
&lt;p&gt;This makes the model training a whole lot more likely to be replicable by other teams.&lt;/p&gt;
&lt;p&gt;The paper also describes some enormous efficiency improvements they made to the training process.&lt;/p&gt;
&lt;p&gt;The LLaMA research was still extremely expensive though. From the paper:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;... we estimate that we used 2048 A100-80GB for a period of approximately 5 months to develop our models&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My friends at &lt;a href="https://replicate.com/"&gt;Replicate&lt;/a&gt; told me that a simple rule of thumb for A100 cloud costs is $1/hour.&lt;/p&gt;
&lt;p&gt;2048 * 5 * 30 * 24 = $7,372,800&lt;/p&gt;
&lt;p&gt;But... that $7M was the cost to both iterate on the model and to train all four sizes of LLaMA that they tried: 7B, 13B, 33B, and 65B.&lt;/p&gt;
&lt;p&gt;Here's Table 15 from the paper, showing the cost of training each model.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2023/llama-table-15.jpg" alt="Table 15: Carbon footprint of training different models in the same data center. We follow Wu et al. (2022) to compute carbon emission of training OPT, BLOOM and our models in the same data center. For the power consumption of a A100-80GB, we take the thermal design power for NVLink systems, that is 400W. We take a PUE of 1.1 and a carbon intensity factor set at the national US average of 0.385 kg COze per KWh. Lists 6 models. OPT-175B: 809,472 GPU hours, 356 MWh, 137 tons CO2. BLOOM-175B: 1,082,880 GPU hours, 475 MWh, 183 tons. LLaMA-7B: 82,432 GPU hours, 36 MWh, 14 tons. LLaMA-13B: 135,168 GPU hours, 59 MWh, 23 tons. LLaMA-33B: 530,432 GPU hours, 233 MWh, 90 tons. LLaMA-65B: 1,022,362 GPU hours, 449 MWh, 173 tons." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;This shows that the smallest model, LLaMA-7B, was trained on 82,432 hours of A100-80GB GPUs, costing 36MWh and generating 14 tons of CO2.&lt;/p&gt;
&lt;p&gt;(That's about 28 people flying from London to New York.)&lt;/p&gt;
&lt;p&gt;Going by the $1/hour rule of thumb, this means that provided you get everything right on your first run you can train a LLaMA-7B scale model for around $82,432.&lt;/p&gt;
&lt;h4&gt;Upgrading to Alpaca&lt;/h4&gt;
&lt;p&gt;You can run LLaMA 7B &lt;a href="https://til.simonwillison.net/llms/llama-7b-m2"&gt;on your own laptop&lt;/a&gt; (or even &lt;a href="https://twitter.com/ggerganov/status/1635605532726681600"&gt;on a phone&lt;/a&gt;), but you may find it hard to get good results out of. That's because it hasn't been instruction tuned, so it's not great at answering the kind of prompts that you might send to ChatGPT or GPT-3 or 4.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://crfm.stanford.edu/2023/03/13/alpaca.html"&gt;Alpaca&lt;/a&gt; is the project from Stanford that fixes that. They fine-tuned LLaMA on 52,000 instructions (of &lt;a href="https://simonwillison.net/2023/Mar/13/alpaca/#bonus-training-data"&gt;somewhat dubious origin&lt;/a&gt;) and claim to have gotten ChatGPT-like performance as a result... from that smallest 7B LLaMA model!&lt;/p&gt;
&lt;p&gt;You can &lt;a href="https://crfm.stanford.edu/alpaca/"&gt;try out their demo&lt;/a&gt; (&lt;strong&gt;update:&lt;/strong&gt; no you can't, "Our live demo is suspended until further notice") and see for yourself that it really does capture at least some of that ChatGPT magic.&lt;/p&gt;
&lt;p&gt;The best bit? The Alpaca fine-tuning can be done for less than $100. The Replicate team have repeated the training process and &lt;a href="https://replicate.com/blog/replicate-alpaca"&gt;published a tutorial&lt;/a&gt; about how they did it.&lt;/p&gt;
&lt;p&gt;Other teams have also been able to replicate the Alpaca fine-tuning process, for example &lt;a href="https://github.com/antimatter15/alpaca.cpp"&gt;antimatter15/alpaca.cpp&lt;/a&gt; on GitHub.&lt;/p&gt;
&lt;p&gt;We are still within our $85,000 budget! And Alpaca - or an Alpaca-like model using different fine tuning data - is the ChatGPT on your own device model that we've all been hoping for.&lt;/p&gt;
&lt;h4&gt;Could we run it in a browser?&lt;/h4&gt;
&lt;p&gt;Alpaca is effectively the same size as LLaMA 7B - around 3.9GB (after 4-bit quantization ala &lt;a href="github.com/ggerganov/llama.cpp"&gt;llama.cpp&lt;/a&gt;). And LLaMA 7B has already been shown running on a whole bunch of different personal devices: laptops, Raspberry Pis (very slowly) and even a Pixel 5 phone at a decent speed!&lt;/p&gt;
&lt;p&gt;The next frontier: running it in the browser.&lt;/p&gt;
&lt;p&gt;I saw two tech demos yesterday that made me think this may be possible in the near future.&lt;/p&gt;
&lt;p&gt;The first is &lt;a href="https://github.com/xenova/transformers.js"&gt;Transformers.js&lt;/a&gt;. This is a WebAssembly port of the Hugging Face &lt;a href="https://huggingface.co/docs/transformers/index"&gt;Transformers&lt;/a&gt; library of models - previously only available for server-side Python.&lt;/p&gt;
&lt;p&gt;It's worth spending some time with &lt;a href="https://xenova.github.io/transformers.js/"&gt;their demos&lt;/a&gt;, which include some smaller language models and some very impressive image analysis languages too.&lt;/p&gt;
&lt;p&gt;The second is &lt;a href="https://github.com/mlc-ai/web-stable-diffusion"&gt;Web Stable Diffusion&lt;/a&gt;. This team managed to get the Stable Diffusion generative image model running entirely in the browser as well!&lt;/p&gt;
&lt;p&gt;Web Stable Diffusion uses WebGPU, a still emerging standard that's currently only working in Chrome Canary. But it does work! It rendered me this image of two raccoons eating a pie in the forest in 38 seconds.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2023/web-stable-diffusion-raccoons.jpg" alt="mig.ai/web-stable-diffusion/ in a browser. The input prompt is two racoons eating a pie in the woods, with the default 20 step scheduler. After 38 seconds elapsed on the prograss bar a realistic photograph of two raccoons eating a fruit pie appears - although on closer inspection the raccoon holding the pie has three paws!" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;The Stable Diffusion model this loads into the browser is around 1.9GB.&lt;/p&gt;
&lt;p&gt;LLaMA/Alpaca at 4bit quantization is 3.9GB.&lt;/p&gt;
&lt;p&gt;The sizes of these two models are similar enough that I would not be at all surprised to see an Alpaca-like model running in the browser in the not-too-distant future. I wouldn't be surprised if someone is working on that right now.&lt;/p&gt;
&lt;h4 id="react-pattern"&gt;Now give it extra abilities with ReAct&lt;/h4&gt;
&lt;p&gt;A model running in your browser that behaved like a less capable version of ChatGPT would be pretty impressive. But what if it could be MORE capable than ChatGPT?&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://react-lm.github.io/"&gt;ReAct prompt pattern&lt;/a&gt; is a simple, proven way of expanding a language model's abilities by giving it access to extra tools.&lt;/p&gt;
&lt;p&gt;Matt Webb explains the significance of the pattern in &lt;a href="https://interconnected.org/home/2023/03/16/singularity"&gt;The surprising ease and effectiveness of AI in a loop&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I got it working with a few dozen lines of Python myself, which I described in &lt;a href="https://til.simonwillison.net/llms/python-react-pattern"&gt;A simple Python implementation of the ReAct pattern for LLMs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's the short version: you tell the model that it must think out loud and now has access to tools. It can then work through a question like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Question:&lt;/strong&gt; Population of Paris, squared?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Thought:&lt;/strong&gt; I should look up the population of paris and then multiply it&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Action:&lt;/strong&gt; search_wikipedia: Paris&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Then it stops. Your code harness for the model reads that last line, sees the action and goes and executes an API call against Wikipedia. It continues the dialog with the model like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Observation:&lt;/strong&gt; &amp;lt;truncated content from the Wikipedia page, including the 2,248,780 population figure&amp;gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The model continues:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Thought:&lt;/strong&gt; Paris population is 2,248,780 I should square that&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Action:&lt;/strong&gt; calculator: 2248780 ** 2&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Control is handed back to the harness, which passes that to a calculator and returns:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Observation:&lt;/strong&gt; 5057011488400&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The model then provides the answer:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Answer:&lt;/strong&gt; The population of Paris squared is 5,057,011,488,400&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Adding new actions to this system is trivial: each one can be a few lines of code.&lt;/p&gt;
&lt;p&gt;But as &lt;a href="https://react-lm.github.io/"&gt;the ReAct paper&lt;/a&gt; demonstrates, adding these capabilities to even an under-powered model (such as LLaMA 7B) can dramatically improve its abilities, at least according to several common language model benchmarks.&lt;/p&gt;
&lt;p&gt;This is essentially what Bing is! It's GPT-4 with the added ability to run searches against the Bing search index.&lt;/p&gt;
&lt;p&gt;Obviously if you're going to give a language model the ability to execute API calls and evaluate code you need to do it in a safe environment! Like for example... a web browser, which runs code from untrusted sources as a matter of habit and has the most thoroughly tested sandbox mechanism of any piece of software we've ever created.&lt;/p&gt;
&lt;h4 id="llm-conclusion"&gt;Adding it all together&lt;/h4&gt;
&lt;p&gt;There are a lot more groups out there that can afford to spend $85,000 training a model than there are that can spend $2M or more.&lt;/p&gt;
&lt;p&gt;I think LLaMA and Alpaca are going to have a lot of competition soon, from an increasing pool of openly licensed models.&lt;/p&gt;
&lt;p&gt;A fine-tuned LLaMA scale model is leaning in the direction of a ChatGPT competitor already. But... if you hook in some extra capabilities as seen in ReAct and Bing even that little model should be able to way outperform ChatGPT in terms of actual ability to solve problems and do interesting things.&lt;/p&gt;
&lt;p&gt;And we might be able to run such a thing on our phones... or even in our web browsers... sooner than you think.&lt;/p&gt;
&lt;h4 id="llm-cheaper"&gt;And it's only going to get cheaper&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://twitter.com/tobi/status/1636810016140271616"&gt;Tobias Lütke on Twitter:&lt;/a&gt;&lt;/p&gt;

&lt;blockquote class="twitter-tweet" data-conversation="none"&gt;&lt;p lang="en" dir="ltr"&gt;H100s are shipping and you can half this again. Twice (or more) if fp8 works.&lt;/p&gt;- tobi lutke (@tobi) &lt;a href="https://twitter.com/tobi/status/1636810016140271616?ref_src=twsrc%5Etfw"&gt;March 17, 2023&lt;/a&gt;&lt;/blockquote&gt;

&lt;p&gt;The &lt;a href="https://www.nvidia.com/en-us/data-center/h100/"&gt;H100&lt;/a&gt; is the new Tensor Core GPU from NVIDIA, which they claim can offer up to a 30x performance improvement over their current A100s.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/bloom"&gt;bloom&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlc"&gt;mlc&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/transformers-js"&gt;transformers-js&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-tool-use"&gt;llm-tool-use&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama-cpp"&gt;llama-cpp&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tobias-lutke"&gt;tobias-lutke&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="webassembly"/><category term="generative-ai"/><category term="chatgpt"/><category term="llama"/><category term="local-llms"/><category term="llms"/><category term="bloom"/><category term="mlc"/><category term="transformers-js"/><category term="llm-tool-use"/><category term="llama-cpp"/><category term="tobias-lutke"/></entry><entry><title>Quoting Tobi Lutke</title><link href="https://simonwillison.net/2019/Dec/26/tobi-lutke/#atom-tag" rel="alternate"/><published>2019-12-26T19:06:35+00:00</published><updated>2019-12-26T19:06:35+00:00</updated><id>https://simonwillison.net/2019/Dec/26/tobi-lutke/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/tobi/status/1210242188870930433"&gt;&lt;p&gt;For creative work, you can't cheat. My believe is that there are 5 creative hours in everyone's day. All I ask of people at Shopify is that 4 of those are channeled into the company.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/tobi/status/1210242188870930433"&gt;Tobi Lutke&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/productivity"&gt;productivity&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/management"&gt;management&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tobias-lutke"&gt;tobias-lutke&lt;/a&gt;&lt;/p&gt;



</summary><category term="productivity"/><category term="management"/><category term="tobias-lutke"/></entry></feed>