<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: webgpu</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/webgpu.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-09-08T20:53:52+00:00</updated><author><name>Simon Willison</name></author><entry><title>Load Llama-3.2 WebGPU in your browser from a local folder</title><link href="https://simonwillison.net/2025/Sep/8/webgpu-local-folder/#atom-tag" rel="alternate"/><published>2025-09-08T20:53:52+00:00</published><updated>2025-09-08T20:53:52+00:00</updated><id>https://simonwillison.net/2025/Sep/8/webgpu-local-folder/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://static.simonwillison.net/static/2025/llama-3.2-webgpu/"&gt;Load Llama-3.2 WebGPU in your browser from a local folder&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Inspired by &lt;a href="https://news.ycombinator.com/item?id=45168953#45169054"&gt;a comment&lt;/a&gt; on Hacker News I decided to see if it was possible to modify the &lt;a href="https://github.com/huggingface/transformers.js-examples/tree/main/llama-3.2-webgpu"&gt;transformers.js-examples/tree/main/llama-3.2-webgpu&lt;/a&gt; Llama 3.2 chat demo (&lt;a href="https://huggingface.co/spaces/webml-community/llama-3.2-webgpu"&gt;online here&lt;/a&gt;, I &lt;a href="https://simonwillison.net/2024/Sep/30/llama-32-webgpu/"&gt;wrote about it last November&lt;/a&gt;) to add an option to open a local model file directly from a folder on disk, rather than waiting for it to download over the network.&lt;/p&gt;
&lt;p&gt;I posed the problem to OpenAI's GPT-5-enabled Codex CLI like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git clone https://github.com/huggingface/transformers.js-examples
cd transformers.js-examples/llama-3.2-webgpu
codex
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then this prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Modify this application such that it offers the user a file browse button for selecting their own local copy of the model file instead of loading it over the network. Provide a "download model" option too.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Codex churned away for several minutes, even running commands like &lt;code&gt;curl -sL https://raw.githubusercontent.com/huggingface/transformers.js/main/src/models.js | sed -n '1,200p'&lt;/code&gt; to inspect the source code of the underlying Transformers.js library.&lt;/p&gt;
&lt;p&gt;After four prompts total (&lt;a href="https://gist.github.com/simonw/3c46c9e609f6ee77367a760b5ca01bd2?permalink_comment_id=5751814#gistcomment-5751814"&gt;shown here&lt;/a&gt;) it built something which worked!&lt;/p&gt;
&lt;p&gt;To try it out you'll need your own local copy of the Llama 3.2 ONNX model. You can get that (a ~1.2GB) download) like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git lfs install
git clone https://huggingface.co/onnx-community/Llama-3.2-1B-Instruct-q4f16
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then visit my &lt;a href="https://static.simonwillison.net/static/2025/llama-3.2-webgpu/"&gt;llama-3.2-webgpu&lt;/a&gt; page in Chrome or Firefox Nightly (since WebGPU is required), click "Browse folder", select that folder you just cloned, agree to the "Upload" confirmation (confusing since nothing is uploaded from your browser, the model file is opened locally on your machine) and click "Load local model".&lt;/p&gt;
&lt;p&gt;Here's an animated demo (recorded in real-time, I didn't speed this up):&lt;/p&gt;
&lt;p&gt;&lt;img alt="GIF. I follow the setup instructions, clicking to load a local model and browsing to the correct folder. Once loaded the model shows a chat interface, I run the example about time management which returns tokens at about 10/second." src="https://static.simonwillison.net/static/2025/webgpu-llama-demo-small.gif" /&gt;&lt;/p&gt;
&lt;p&gt;I pushed &lt;a href="https://github.com/simonw/transformers.js-examples/commit/cdebf4128c6e30414d437affd4b13b6c9c79421d"&gt;a branch with those changes here&lt;/a&gt;. The next step would be to modify this to support other models in addition to the Llama 3.2 demo, but I'm pleased to have got to this proof of concept with so little work beyond throwing some prompts at Codex to see if it could figure it out.&lt;/p&gt;
&lt;p&gt;According to the Codex &lt;code&gt;/status&lt;/code&gt; command &lt;a href="https://gist.github.com/simonw/3c46c9e609f6ee77367a760b5ca01bd2?permalink_comment_id=5751807#gistcomment-5751807"&gt;this used&lt;/a&gt; 169,818 input tokens, 17,112 output tokens and 1,176,320 cached input tokens. At current GPT-5 token pricing ($1.25/million input, $0.125/million cached input, $10/million output) that would cost 53.942 cents, but Codex CLI hooks into my existing $20/month ChatGPT Plus plan so this was bundled into that.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=45168953#45173297"&gt;My Hacker News comment&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/transformers-js"&gt;transformers-js&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webgpu"&gt;webgpu&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/onnx"&gt;onnx&lt;/a&gt;&lt;/p&gt;



</summary><category term="javascript"/><category term="ai"/><category term="generative-ai"/><category term="llama"/><category term="local-llms"/><category term="llms"/><category term="ai-assisted-programming"/><category term="transformers-js"/><category term="webgpu"/><category term="llm-pricing"/><category term="vibe-coding"/><category term="gpt-5"/><category term="codex"/><category term="gpt"/><category term="onnx"/></entry><entry><title>Shipping WebGPU on Windows in Firefox 141</title><link href="https://simonwillison.net/2025/Jul/16/webgpu-firefox/#atom-tag" rel="alternate"/><published>2025-07-16T13:51:26+00:00</published><updated>2025-07-16T13:51:26+00:00</updated><id>https://simonwillison.net/2025/Jul/16/webgpu-firefox/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://mozillagfx.wordpress.com/2025/07/15/shipping-webgpu-on-windows-in-firefox-141/"&gt;Shipping WebGPU on Windows in Firefox 141&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
WebGPU is coming to Mac and Linux soon as well:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Although Firefox 141 enables WebGPU only on Windows, we plan to ship WebGPU on Mac and Linux in the coming months, and finally on Android. &lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From this article I learned that it's already available in &lt;a href="https://www.mozilla.org/en-US/firefox/channel/desktop/"&gt;Firefox Nightly&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note that WebGPU has been available in Firefox Nightly on all platforms other than Android for quite some time.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I tried the most recent Nightly on my Mac and now the &lt;a href="https://huggingface.co/spaces/reach-vb/github-issue-generator-webgpu"&gt;Github Issue Generator running locally w/ SmolLM2 &amp;amp; WebGPU&lt;/a&gt; demo (&lt;a href="https://simonwillison.net/2024/Nov/29/structured-generation-smollm2-webgpu/"&gt;previously&lt;/a&gt;) works! Firefox stable gives me an error message saying "Error: WebGPU is not supported in your current environment, but it is necessary to run the WebLLM engine."&lt;/p&gt;
&lt;p&gt;The Firefox implementation is based on &lt;a href="https://github.com/gfx-rs/wgpu"&gt;wgpu&lt;/a&gt;, an open source Rust WebGPU library.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=44579317"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/browsers"&gt;browsers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/firefox"&gt;firefox&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mozilla"&gt;mozilla&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webgpu"&gt;webgpu&lt;/a&gt;&lt;/p&gt;



</summary><category term="browsers"/><category term="firefox"/><category term="mozilla"/><category term="rust"/><category term="webgpu"/></entry><entry><title>Structured Generation w/ SmolLM2 running in browser &amp; WebGPU</title><link href="https://simonwillison.net/2024/Nov/29/structured-generation-smollm2-webgpu/#atom-tag" rel="alternate"/><published>2024-11-29T21:09:11+00:00</published><updated>2024-11-29T21:09:11+00:00</updated><id>https://simonwillison.net/2024/Nov/29/structured-generation-smollm2-webgpu/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/spaces/reach-vb/github-issue-generator-webgpu"&gt;Structured Generation w/ SmolLM2 running in browser &amp;amp; WebGPU&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Extraordinary demo by Vaibhav Srivastav (VB). Here's Hugging Face's &lt;a href="https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct"&gt;SmolLM2-1.7B-Instruct&lt;/a&gt; running directly in a web browser (using WebGPU, so requires Chrome &lt;a href="https://github.com/gpuweb/gpuweb/wiki/Implementation-Status"&gt;for the moment&lt;/a&gt;) demonstrating structured text extraction, converting a text description of an image into a structured GitHub issue defined using JSON schema.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Interface showing text input, a JSON schema, extracted JSON and a UI that demonstrates the structured resulting GitHub Issue" src="https://static.simonwillison.net/static/2024/github-issue-extract.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;The page loads 924.8MB of model data (according to &lt;a href="https://gist.github.com/simonw/3ccba6256e95b59ea6a17509855830b4"&gt;this script to sum up files in window.caches&lt;/a&gt;) and performs everything in-browser. I did not know a model this small could produce such useful results.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/Vaibhavs10/github-issue-generator-webgpu/blob/main/src/index.js"&gt;the source code&lt;/a&gt; for the demo. It's around 200 lines of code, 50 of which are the JSON schema describing the data to be extracted.&lt;/p&gt;
&lt;p&gt;The real secret sauce here is &lt;a href="https://github.com/mlc-ai/web-llm"&gt;web-llm&lt;/a&gt; by MLC. This library has made loading and executing prompts through LLMs in the browser shockingly easy, and recently incorporated support for MLC's &lt;a href="https://xgrammar.mlc.ai/"&gt;XGrammar&lt;/a&gt; library (also available in Python) which implements both JSON schema and EBNF-based structured output guidance.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://bsky.app/profile/reach-vb.hf.co/post/3lc24bmj6fk2j"&gt;@reach-vb.hf.co&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlc"&gt;mlc&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webgpu"&gt;webgpu&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/smollm"&gt;smollm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/structured-extraction"&gt;structured-extraction&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="webassembly"/><category term="generative-ai"/><category term="llms"/><category term="mlc"/><category term="hugging-face"/><category term="webgpu"/><category term="smollm"/><category term="structured-extraction"/></entry><entry><title>llama-3.2-webgpu</title><link href="https://simonwillison.net/2024/Sep/30/llama-32-webgpu/#atom-tag" rel="alternate"/><published>2024-09-30T16:27:22+00:00</published><updated>2024-09-30T16:27:22+00:00</updated><id>https://simonwillison.net/2024/Sep/30/llama-32-webgpu/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/spaces/webml-community/llama-3.2-webgpu"&gt;llama-3.2-webgpu&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Llama 3.2 1B is a really interesting models, given its 128,000 token input and its tiny size (barely more than a GB).&lt;/p&gt;
&lt;p&gt;This page loads a &lt;a href="https://huggingface.co/onnx-community/Llama-3.2-1B-Instruct-q4f16/tree/main/onnx"&gt;1.24GB q4f16 ONNX build&lt;/a&gt; of the Llama-3.2-1B-Instruct model and runs it with a React-powered chat interface directly in the browser, using &lt;a href="https://huggingface.co/docs/transformers.js/en/index"&gt;Transformers.js&lt;/a&gt; and WebGPU. &lt;a href="https://github.com/huggingface/transformers.js-examples/tree/main/llama-3.2-webgpu"&gt;Source code for the demo is here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It worked for me just now in Chrome; in Firefox and Safari I got a “WebGPU is not supported by this browser” error message.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/xenovacom/status/1840767709317046460"&gt;@xenovacom&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/transformers-js"&gt;transformers-js&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webgpu"&gt;webgpu&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/onnx"&gt;onnx&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="webassembly"/><category term="generative-ai"/><category term="llama"/><category term="llms"/><category term="transformers-js"/><category term="webgpu"/><category term="onnx"/></entry><entry><title>Quoting Erich Gubler</title><link href="https://simonwillison.net/2024/Aug/5/erich-gubler/#atom-tag" rel="alternate"/><published>2024-08-05T02:26:40+00:00</published><updated>2024-08-05T02:26:40+00:00</updated><id>https://simonwillison.net/2024/Aug/5/erich-gubler/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://news.ycombinator.com/item?id=41156872#41157602"&gt;&lt;p&gt;[On WebGPU in Firefox] There is a &lt;em&gt;lot&lt;/em&gt; of work to do still to make sure we comply with the spec. in a way that's acceptable to ship in a browser. We're 90% of the way there in terms of functionality, but the last 10% of fixing up spec. changes in the last few years + being significantly more resourced-constrained (we have 3 full-time folks, Chrome has/had an order of magnitude more humans working on WebGPU) means we've got our work cut out for us. We're hoping to ship sometime in the next year, but I won't make promises here.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://news.ycombinator.com/item?id=41156872#41157602"&gt;Erich Gubler&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/firefox"&gt;firefox&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webgpu"&gt;webgpu&lt;/a&gt;&lt;/p&gt;



</summary><category term="firefox"/><category term="webgpu"/></entry><entry><title>experimental-phi3-webgpu</title><link href="https://simonwillison.net/2024/May/9/experimental-phi3-webgpu/#atom-tag" rel="alternate"/><published>2024-05-09T22:21:48+00:00</published><updated>2024-05-09T22:21:48+00:00</updated><id>https://simonwillison.net/2024/May/9/experimental-phi3-webgpu/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/spaces/Xenova/experimental-phi3-webgpu"&gt;experimental-phi3-webgpu&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Run Microsoft’s excellent Phi-3 model directly in your browser, using WebGPU so didn’t work in Firefox for me, just in Chrome.&lt;/p&gt;

&lt;p&gt;It fetches around 2.1GB of data into the browser cache on first run, but then gave me decent quality responses to my prompts running at an impressive 21 tokens a second (M2, 64GB).&lt;/p&gt;

&lt;p&gt;I think Phi-3 is the highest quality model of this size, so it’s a really good fit for running in a browser like this.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/xenovacom/status/1788664184487432679"&gt;@xenovacom&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/browsers"&gt;browsers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/phi"&gt;phi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webgpu"&gt;webgpu&lt;/a&gt;&lt;/p&gt;



</summary><category term="browsers"/><category term="ai"/><category term="webassembly"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="phi"/><category term="webgpu"/></entry><entry><title>WebLLM supports Llama 2 70B now</title><link href="https://simonwillison.net/2023/Aug/30/webllm-supports-llama-2-70b-now/#atom-tag" rel="alternate"/><published>2023-08-30T14:41:26+00:00</published><updated>2023-08-30T14:41:26+00:00</updated><id>https://simonwillison.net/2023/Aug/30/webllm-supports-llama-2-70b-now/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://webllm.mlc.ai/"&gt;WebLLM supports Llama 2 70B now&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The WebLLM project from MLC uses WebGPU to run large language models entirely in the browser. They recently added support for Llama 2, including Llama 2 70B, the largest and most powerful model in that family.&lt;/p&gt;

&lt;p&gt;To my astonishment, this worked! I used a M2 Mac with 64GB of RAM and Chrome Canary and it downloaded many GBs of data... but it worked, and spat out tokens at a slow but respectable rate of 3.25 tokens/second.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlc"&gt;mlc&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webgpu"&gt;webgpu&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="webassembly"/><category term="generative-ai"/><category term="llama"/><category term="llms"/><category term="mlc"/><category term="webgpu"/></entry><entry><title>MLC: Bringing Open Large Language Models to Consumer Devices</title><link href="https://simonwillison.net/2023/May/22/mlc-redpajama/#atom-tag" rel="alternate"/><published>2023-05-22T19:25:13+00:00</published><updated>2023-05-22T19:25:13+00:00</updated><id>https://simonwillison.net/2023/May/22/mlc-redpajama/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://mlc.ai/blog/2023/05/22/bringing-open-large-language-models-to-consumer-devices"&gt;MLC: Bringing Open Large Language Models to Consumer Devices&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
“We bring RedPajama, a permissive open language model to WebGPU, iOS, GPUs, and various other platforms.” I managed to get this running on my Mac (see via link) with a few tweaks to their official instructions.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://til.simonwillison.net/llms/mlc-chat-redpajama"&gt;mlc-chat - RedPajama-INCITE-Chat-3B on macOS&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlc"&gt;mlc&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/redpajama"&gt;redpajama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webgpu"&gt;webgpu&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpus"&gt;gpus&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="mlc"/><category term="redpajama"/><category term="webgpu"/><category term="gpus"/></entry><entry><title>Web Stable Diffusion</title><link href="https://simonwillison.net/2023/Mar/17/web-stable-diffusion/#atom-tag" rel="alternate"/><published>2023-03-17T04:46:56+00:00</published><updated>2023-03-17T04:46:56+00:00</updated><id>https://simonwillison.net/2023/Mar/17/web-stable-diffusion/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/mlc-ai/web-stable-diffusion"&gt;Web Stable Diffusion&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I just ran the full Stable Diffusion image generation model entirely in my browser, and used it to generate an image of two raccoons eating pie in the woods. I had to use Google Chrome Canary since this depends on WebGPU which still isn't fully rolled out, but it worked perfectly.&lt;/p&gt;
&lt;p&gt;&lt;img alt="mic.ai/web-stable-diffusion/ in Chrome Canary. Prompt: two racoons eatinga a pie in the woods. No negative prompt. Multi-step DPM Solver (20 steps) for the scheduler. Initializing GPU device: WebGPU - apple. A completed progress bar which says it took 38 seconds. And a quite realistic looking photograph of two racoons in the woods, one of whom is eating a pie (though on closer inspection he does have three paws, two holding the pie and one beneath it). The second raccoon only has two paws." src="https://static.simonwillison.net/static/2023/racoons-eating-pie.jpg" /&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://fedi.simonwillison.net/@simon/110036800515374711"&gt;@simon on Mastodon&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/browsers"&gt;browsers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chrome"&gt;chrome&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/stable-diffusion"&gt;stable-diffusion&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlc"&gt;mlc&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webgpu"&gt;webgpu&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/text-to-image"&gt;text-to-image&lt;/a&gt;&lt;/p&gt;



</summary><category term="browsers"/><category term="chrome"/><category term="javascript"/><category term="ai"/><category term="webassembly"/><category term="stable-diffusion"/><category term="generative-ai"/><category term="mlc"/><category term="webgpu"/><category term="text-to-image"/></entry></feed>