<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: prince-canuma</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/prince-canuma.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-01-22T17:42:34+00:00</updated><author><name>Simon Willison</name></author><entry><title>Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation</title><link href="https://simonwillison.net/2026/Jan/22/qwen3-tts/#atom-tag" rel="alternate"/><published>2026-01-22T17:42:34+00:00</published><updated>2026-01-22T17:42:34+00:00</updated><id>https://simonwillison.net/2026/Jan/22/qwen3-tts/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://qwen.ai/blog?id=qwen3tts-0115"&gt;Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I haven't been paying much attention to the state-of-the-art in speech generation models other than noting that they've got &lt;em&gt;really good&lt;/em&gt;, so I can't speak for how notable this new release from Qwen is.&lt;/p&gt;
&lt;p&gt;From &lt;a href="https://github.com/QwenLM/Qwen3-TTS/blob/main/assets/Qwen3_TTS.pdf"&gt;the accompanying paper&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In this report, we present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models. Qwen3-TTS supports state-of- the-art 3-second voice cloning and description-based control, allowing both the creation of entirely novel voices and fine-grained manipulation over the output speech. Trained on over 5 million hours of speech data spanning 10 languages, Qwen3-TTS adopts a dual-track LM architecture for real-time synthesis [...]. Extensive experiments indicate state-of-the-art performance across diverse objective and subjective benchmark (e.g., TTS multilingual test set, InstructTTSEval, and our long speech test set). To facilitate community research and development, we release both tokenizers and models under the Apache 2.0 license.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To give an idea of size, &lt;a href="https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base"&gt;Qwen/Qwen3-TTS-12Hz-1.7B-Base&lt;/a&gt; is 4.54GB on Hugging Face and &lt;a href="https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-Base"&gt;Qwen/Qwen3-TTS-12Hz-0.6B-Base&lt;/a&gt; is 2.52GB.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://huggingface.co/spaces/Qwen/Qwen3-TTS"&gt;Hugging Face demo&lt;/a&gt; lets you try out the 0.6B and 1.7B models for free in your browser, including voice cloning:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a Qwen3-TTS voice cloning web interface with three tabs at top: &amp;quot;Voice Design&amp;quot;, &amp;quot;Voice Clone (Base)&amp;quot; (selected), and &amp;quot;TTS (CustomVoice)&amp;quot;. The page is titled &amp;quot;Clone Voice from Reference Audio&amp;quot; and has two main sections. Left section: &amp;quot;Reference Audio (Upload a voice sample clone)&amp;quot; showing an audio waveform player at 0:00/0:34 with playback controls, upload and microphone icons, followed by &amp;quot;Reference Text (Transcript of the reference audio)&amp;quot; containing three paragraphs: &amp;quot;Simon Willison is the creator of Datasette, an open source tool for exploring and publishing data. He currently works full-time building open source tools for data journalism, built around Datasette and SQLite. Prior to becoming an independent open source developer, Simon was an engineering director at Eventbrite. Simon joined Eventbrite through their acquisition of Lanyrd, a Y Combinator funded company he co-founded in 2010. He is a co-creator of the Django Web Framework, and has been blogging about web development and programming since 2002 at simonwillison.net&amp;quot;. Right section: &amp;quot;Target Text (Text to synthesize with cloned voice)&amp;quot; containing text about Qwen3-TTS speech generation capabilities, with &amp;quot;Language&amp;quot; dropdown set to &amp;quot;Auto&amp;quot; and &amp;quot;Model Size&amp;quot; dropdown set to &amp;quot;1.7B&amp;quot;, and a purple &amp;quot;Clone &amp;amp; Generate&amp;quot; button at bottom." src="https://static.simonwillison.net/static/2026/qwen-voice-clone.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;I tried this out by recording myself reading &lt;a href="https://simonwillison.net/about/"&gt;my about page&lt;/a&gt; and then having Qwen3-TTS generate audio of me reading the Qwen3-TTS announcement post. Here's the result:&lt;/p&gt;
&lt;p&gt;&lt;audio controls style="width: 100%"&gt;
  &lt;source src="https://static.simonwillison.net/static/2026/qwen-tts-clone.wav" type="audio/wav"&gt;
  Your browser does not support the audio element.
&lt;/audio&gt;&lt;/p&gt;
&lt;p&gt;It's important that everyone understands that voice cloning is now something that's available to anyone with a GPU and a few GBs of VRAM... or in this case a web browser that can access Hugging Face.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: Prince Canuma &lt;a href="https://x.com/Prince_Canuma/status/2014453857019904423"&gt;got this working&lt;/a&gt; with his &lt;a href="https://pypi.org/project/mlx-audio/"&gt;mlx-audio&lt;/a&gt; library. I &lt;a href="https://claude.ai/share/2e01ad60-ca38-4e14-ab60-74eaa45b2fbd"&gt;had Claude&lt;/a&gt; turn that into &lt;a href="https://github.com/simonw/tools/blob/main/python/q3_tts.py"&gt;a CLI tool&lt;/a&gt; which you can run with &lt;code&gt;uv&lt;/code&gt; ike this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uv run https://tools.simonwillison.net/python/q3_tts.py \
  'I am a pirate, give me your gold!' \
  -i 'gruff voice' -o pirate.wav
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;-i&lt;/code&gt; option lets you use a prompt to describe the voice it should use. On first run this downloads a 4.5GB model file from Hugging Face.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=46719229"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/text-to-speech"&gt;text-to-speech&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlx"&gt;mlx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prince-canuma"&gt;prince-canuma&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;



</summary><category term="text-to-speech"/><category term="ai"/><category term="generative-ai"/><category term="hugging-face"/><category term="uv"/><category term="qwen"/><category term="mlx"/><category term="prince-canuma"/><category term="ai-in-china"/></entry><entry><title>Introducing Gemma 3n: The developer guide</title><link href="https://simonwillison.net/2025/Jun/26/gemma-3n/#atom-tag" rel="alternate"/><published>2025-06-26T21:08:36+00:00</published><updated>2025-06-26T21:08:36+00:00</updated><id>https://simonwillison.net/2025/Jun/26/gemma-3n/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/"&gt;Introducing Gemma 3n: The developer guide&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Extremely consequential new open weights model release from Google today:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Multimodal by design:&lt;/strong&gt; Gemma 3n natively supports image, audio, video, and text inputs and text outputs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Optimized for on-device:&lt;/strong&gt; Engineered with a focus on efficiency, Gemma 3n models are available in two sizes based on &lt;a href="https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/#per-layer-embeddings-(ple):-unlocking-more-memory-efficiency"&gt;&lt;strong&gt;effective&lt;/strong&gt;&lt;/a&gt; parameters: E2B and E4B. While their raw parameter count is 5B and 8B respectively, architectural innovations allow them to run with a memory footprint comparable to traditional 2B and 4B models, operating with as little as 2GB (E2B) and 3GB (E4B) of memory.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is &lt;strong&gt;very&lt;/strong&gt; exciting: a 2B and 4B model optimized for end-user devices which accepts text, images &lt;em&gt;and&lt;/em&gt; audio as inputs!&lt;/p&gt;
&lt;p&gt;Gemma 3n is also the most comprehensive day one launch I've seen for any model: Google partnered with "AMD, Axolotl, Docker, Hugging Face, llama.cpp, LMStudio, MLX, NVIDIA, Ollama, RedHat, SGLang, Unsloth, and vLLM" so there are dozens of ways to try this out right now.&lt;/p&gt;
&lt;p&gt;So far I've run two variants on my Mac laptop. Ollama offer &lt;a href="https://ollama.com/library/gemma3n"&gt;a 7.5GB version&lt;/a&gt; (full tag &lt;code&gt;gemma3n:e4b-it-q4_K_M0&lt;/code&gt;) of the 4B model, which I ran like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ollama pull gemma3n
llm install llm-ollama
llm -m gemma3n:latest "Generate an SVG of a pelican riding a bicycle"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It drew me this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="The pelican looks a bit like a grey pig. It is floating above a bicycle that looks more like a rail cart." src="https://static.simonwillison.net/static/2025/gemma3n-ollama.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;The Ollama version doesn't appear to support image or audio input yet.&lt;/p&gt;
&lt;p&gt;... but the &lt;a href="https://github.com/Blaizzy/mlx-vlm"&gt;mlx-vlm&lt;/a&gt; version does!&lt;/p&gt;
&lt;p&gt;First I tried that on &lt;a href="https://static.simonwillison.net/static/2025/pelican-joke-request.wav"&gt;this WAV file&lt;/a&gt; like so (using a recipe adapted from &lt;a href="https://www.youtube.com/watch?v=8-8R2UvUBrc"&gt;Prince Canuma's video&lt;/a&gt;):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uv run --with mlx-vlm mlx_vlm.generate \
  --model gg-hf-gm/gemma-3n-E4B-it \
  --max-tokens 100 \
  --temperature 0.7 \
  --prompt "Transcribe the following speech segment in English:" \
  --audio pelican-joke-request.wav
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That downloaded a 15.74 GB bfloat16 version of the model and output the following correct transcription:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Tell me a joke about a pelican.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Then I had it draw me a pelican for good measure:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uv run --with mlx-vlm mlx_vlm.generate \
  --model gg-hf-gm/gemma-3n-E4B-it \
  --max-tokens 100 \
  --temperature 0.7 \
  --prompt "Generate an SVG of a pelican riding a bicycle"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I quite like this one:&lt;/p&gt;
&lt;p&gt;&lt;img alt="The bicycle is a sin wave, but you can make out which bits are the frame and the wheels. The pelican is white with a black triangular bill. It's a much better attempt than the Ollama one." src="https://static.simonwillison.net/static/2025/gemma3n-mlx-vlm.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;It's interesting to see such a striking visual difference between those 7.5GB and 15GB model quantizations.&lt;/p&gt;
&lt;p&gt;Finally, I had it describe the image it had just created:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uv run --with mlx-vlm mlx_vlm.generate \
  --model gg-hf-gm/gemma-3n-E4B-it \
  --max-tokens 1000 \
  --prompt "Describe image" \
  --image gemma3n-mlx-vlm.jpg
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's what I got back - it thought this was a chemical diagram!&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The image is a cartoon-style illustration depicting a molecular structure against a light blue background. The structure is composed of several differently colored and shaped elements connected by curved black lines. &lt;/p&gt;
&lt;p&gt;Here's a breakdown of the elements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Large Black Circles:&lt;/strong&gt; There are two large, solid black circles, likely representing larger molecules or components.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cream-Colored Oval:&lt;/strong&gt; A prominent cream-colored oval shape is centrally located and connected to other elements. It has a smaller, darker cream-colored sphere attached to its left side.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Red Oval:&lt;/strong&gt; A smaller, solid red oval is positioned near one of the black circles and connected to the cream-colored oval.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Black Lines:&lt;/strong&gt; These lines act as bonds, connecting the various molecular components. They are curved and dynamic, suggesting movement or interaction.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Triangular Shape:&lt;/strong&gt; A small black triangle is attached to the smaller cream-colored sphere.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Letter "I":&lt;/strong&gt; The letter "I" appears twice, likely labeling specific parts of the molecule. &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The overall impression is of a simplified representation of a biological molecule, possibly a protein or a complex organic compound. The use of different colors helps to distinguish the various components within the structure.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/audio"&gt;audio&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlx"&gt;mlx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ollama"&gt;ollama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gemma"&gt;gemma&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prince-canuma"&gt;prince-canuma&lt;/a&gt;&lt;/p&gt;



</summary><category term="audio"/><category term="google"/><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="vision-llms"/><category term="mlx"/><category term="ollama"/><category term="pelican-riding-a-bicycle"/><category term="gemma"/><category term="llm-release"/><category term="prince-canuma"/></entry><entry><title>Qwen2.5-VL-32B: Smarter and Lighter</title><link href="https://simonwillison.net/2025/Mar/24/qwen25-vl-32b/#atom-tag" rel="alternate"/><published>2025-03-24T22:43:22+00:00</published><updated>2025-03-24T22:43:22+00:00</updated><id>https://simonwillison.net/2025/Mar/24/qwen25-vl-32b/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://qwenlm.github.io/blog/qwen2.5-vl-32b/"&gt;Qwen2.5-VL-32B: Smarter and Lighter&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The second big open weight LLM release from China today - the first being &lt;a href="https://simonwillison.net/2025/Mar/24/deepseek/"&gt;DeepSeek v3-0324&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Qwen's previous vision model was Qwen2.5 VL, &lt;a href="https://simonwillison.net/2025/Jan/27/qwen25-vl-qwen25-vl-qwen25-vl/"&gt;released in January&lt;/a&gt; in 3B, 7B and 72B sizes.&lt;/p&gt;
&lt;p&gt;Today's Apache 2.0 licensed release is a 32B model, which is quickly becoming my personal favourite model size - large enough to have GPT-4-class capabilities, but small enough that on my 64GB Mac there's still enough RAM for me to run other memory-hungry applications like Firefox and VS Code.&lt;/p&gt;
&lt;p&gt;Qwen claim that the new model (when compared to their previous 2.5 VL family) can "align more closely with human preferences", is better at "mathematical reasoning" and provides "enhanced accuracy and detailed analysis in tasks such as image parsing, content recognition, and visual logic deduction".&lt;/p&gt;
&lt;p&gt;They also offer some presumably carefully selected benchmark results showing it out-performing Gemma 3-27B, Mistral Small 3.1 24B and GPT-4o-0513 (there have been two more recent GPT-4o releases since that one, 2024-08-16 and 2024-11-20).&lt;/p&gt;
&lt;p&gt;As usual, Prince Canuma had MLX versions of the models live within hours of the release, in &lt;a href="https://huggingface.co/mlx-community/Qwen2.5-VL-32B-Instruct-4bit"&gt;4 bit&lt;/a&gt;, &lt;a href="https://huggingface.co/mlx-community/Qwen2.5-VL-32B-Instruct-6bit"&gt;6 bit&lt;/a&gt;, &lt;a href="https://huggingface.co/mlx-community/Qwen2.5-VL-32B-Instruct-8bit"&gt;8 bit&lt;/a&gt;, and &lt;a href="https://huggingface.co/mlx-community/Qwen2.5-VL-32B-Instruct-bf16"&gt;bf16&lt;/a&gt; variants.&lt;/p&gt;
&lt;p&gt;I ran the 4bit version (a 18GB model download) using &lt;code&gt;uv&lt;/code&gt; and Prince's &lt;a href="https://github.com/Blaizzy/mlx-vlm"&gt;mlx-vlm&lt;/a&gt; like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;uv run --with &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;numpy&amp;lt;2&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; --with mlx-vlm \
  python -m mlx_vlm.generate \
    --model mlx-community/Qwen2.5-VL-32B-Instruct-4bit \
    --max-tokens 1000 \
    --temperature 0.0 \
    --prompt &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Describe this image.&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
    --image Mpaboundrycdfw-1.png&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Here's the image:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A map image, described below" src="https://static.simonwillison.net/static/2025/Mpaboundrycdfw-1.png" /&gt;&lt;/p&gt;
&lt;p&gt;And here's the resulting description, which I found to be &lt;em&gt;very&lt;/em&gt; impressive:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This image appears to be a map highlighting specific areas along the coastline, likely in California, based on the geographical features and place names. Here is a detailed description of the image:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Geographic Features&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The map shows a section of the coastline, with land on the right and the sea on the left.&lt;/li&gt;
&lt;li&gt;The land area includes parts of SanMateo County, as noted in the top-right corner.&lt;/li&gt;
&lt;li&gt;The coastline features several named locations, including:
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Point Montara&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pillar Point&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Kings Rock&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;VicenteCreek&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DennistonCreek&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;State Beach&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Highlighted Areas&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The map contains two distinct areas marked with different colors:
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Red Area (Top Section)&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Labeled as "MONTARA" and "MONTARA M R" (possibly referring to a "Marine Reserves" or similar protected area).&lt;/li&gt;
&lt;li&gt;This area is marked with a dark red color and extends from the coastline inland.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blue Area (Bottom Section)&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Labeled as "PILLAR" and "PILLAR P O I N T" along with "SMCA" (Special Management Area).&lt;/li&gt;
&lt;li&gt;This area is marked with a blue color and is located closer to the coastline, including the area around "Pillar Point."&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Depth Contours&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The map includes depth markings in the sea, indicating the bathymetrical features:
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;20 fathom (fm) line&lt;/strong&gt; is marked in the upper-left part of the sea.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10 fathom (fm) line&lt;/strong&gt; is marked closer to the coastline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;30 fathom (fm) line&lt;/strong&gt; is marked further out to sea.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Other Features&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;State Beach&lt;/strong&gt;: Marked near the top-right, indicating a protected recreational area.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kings Rock&lt;/strong&gt;: A prominent feature near the coastline, likely a rocky outcropping.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creeks&lt;/strong&gt;: The map shows several creeks, including &lt;strong&gt;VicenteCreek&lt;/strong&gt; and &lt;strong&gt;DennistonCreek&lt;/strong&gt;, which flow into the sea.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Protected Areas&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The map highlights specific protected areas:
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Marine Reserves&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;"MONTARA M R" (Marine Reserves) in red.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Special Management Area (SMCA)&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;"PILLAR P O I N T" in blue, indicating a Special Management Area.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Grid and Coordinates&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The map includes a grid with latitude and longitude markings:
&lt;ul&gt;
&lt;li&gt;Latitude ranges from approximately 37°25'N to 37°35'N.&lt;/li&gt;
&lt;li&gt;Longitude ranges from approximately 122°22.5'W to 122°35.5'W.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Topography&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The land area shows topographic features, including elevations and vegetation, with green areas indicating higher elevations or vegetated land.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Other Labels&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"SMR"&lt;/strong&gt;: Likely stands for "State Managed Reserves."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"SMCA"&lt;/strong&gt;: Likely stands for "Special Management Control Area."&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In summary, this map highlights specific protected areas along the coastline, including a red "Marine Reserves" area and a blue "Special Management Area" near "Pillar Point." The map also includes depth markings, geographical features, and place names, providing a detailed view of the region's natural and protected areas.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It included the following runtime statistics:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Prompt: 1051 tokens, 111.985 tokens-per-sec
Generation: 760 tokens, 17.328 tokens-per-sec
Peak memory: 21.110 GB
&lt;/code&gt;&lt;/pre&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlx"&gt;mlx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prince-canuma"&gt;prince-canuma&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="vision-llms"/><category term="uv"/><category term="qwen"/><category term="mlx"/><category term="llm-release"/><category term="prince-canuma"/><category term="ai-in-china"/></entry><entry><title>Run LLMs on macOS using llm-mlx and Apple's MLX framework</title><link href="https://simonwillison.net/2025/Feb/15/llm-mlx/#atom-tag" rel="alternate"/><published>2025-02-15T19:13:29+00:00</published><updated>2025-02-15T19:13:29+00:00</updated><id>https://simonwillison.net/2025/Feb/15/llm-mlx/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;a href="https://github.com/simonw/llm-mlx"&gt;llm-mlx&lt;/a&gt; is a brand new plugin for my &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; Python Library and CLI utility which builds on top of Apple's excellent &lt;a href="https://github.com/ml-explore/mlx"&gt;MLX array framework library&lt;/a&gt; and &lt;a href="https://pypi.org/project/mlx-lm/"&gt;mlx-lm&lt;/a&gt; package. If you're a terminal user or Python developer with a Mac this may be the new easiest way to start exploring local Large Language Models.&lt;/p&gt;
&lt;h4 id="running-llama-3-2-3b-using-llm-mlx"&gt;Running Llama 3.2 3B using llm-mlx&lt;/h4&gt;
&lt;p&gt;If you haven't already got LLM installed you'll need &lt;a href="https://llm.datasette.io/en/stable/setup.html"&gt;to install it&lt;/a&gt; - you can do that in a bunch of different ways - in order of preference I like &lt;code&gt;uv tool install llm&lt;/code&gt; or &lt;code&gt;pipx install llm&lt;/code&gt; or &lt;code&gt;brew install llm&lt;/code&gt; or &lt;code&gt;pip install llm&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Next, install the new plugin (macOS only):&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-mlx&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now download and register a model. Llama 3.2 3B is an excellent first choice - it's pretty small (a 1.8GB download) but is a surprisingly capable starter model.&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm mlx download-model mlx-community/Llama-3.2-3B-Instruct-4bit&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will download 1.8GB of model weights from &lt;a href="https://huggingface.co/mlx-community/Llama-3.2-3B-Instruct-4bit"&gt;mlx-community/Llama-3.2-3B-Instruct-4bit&lt;/a&gt; on Hugging Face and store them here:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;~/.cache/huggingface/hub/models--mlx-community--Llama-3.2-3B-Instruct-4bit
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now you can start running prompts:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m mlx-community/Llama-3.2-3B-Instruct-4bit &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Python code to traverse a tree, briefly&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Which &lt;a href="https://gist.github.com/simonw/85ce6a0faf1a58a243f9e5f4b2daeb3d"&gt;output this&lt;/a&gt; for me:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Here's a brief example of how to traverse a tree in Python:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;class&lt;/span&gt; &lt;span class="pl-v"&gt;Node&lt;/span&gt;:
    &lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;__init__&lt;/span&gt;(&lt;span class="pl-s1"&gt;self&lt;/span&gt;, &lt;span class="pl-s1"&gt;value&lt;/span&gt;):
        &lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-c1"&gt;value&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;value&lt;/span&gt;
        &lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-c1"&gt;children&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; []

&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;traverse_tree&lt;/span&gt;(&lt;span class="pl-s1"&gt;node&lt;/span&gt;):
    &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-s1"&gt;node&lt;/span&gt; &lt;span class="pl-c1"&gt;is&lt;/span&gt; &lt;span class="pl-c1"&gt;None&lt;/span&gt;:
        &lt;span class="pl-k"&gt;return&lt;/span&gt;
    
    &lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s1"&gt;node&lt;/span&gt;.&lt;span class="pl-c1"&gt;value&lt;/span&gt;)
    
    &lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-s1"&gt;child&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;node&lt;/span&gt;.&lt;span class="pl-c1"&gt;children&lt;/span&gt;:
        &lt;span class="pl-en"&gt;traverse_tree&lt;/span&gt;(&lt;span class="pl-s1"&gt;child&lt;/span&gt;)

&lt;span class="pl-c"&gt;# Example usage:&lt;/span&gt;
&lt;span class="pl-s1"&gt;root&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;Node&lt;/span&gt;(&lt;span class="pl-s"&gt;"A"&lt;/span&gt;)
&lt;span class="pl-s1"&gt;root&lt;/span&gt;.&lt;span class="pl-c1"&gt;children&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; [&lt;span class="pl-en"&gt;Node&lt;/span&gt;(&lt;span class="pl-s"&gt;"B"&lt;/span&gt;), &lt;span class="pl-en"&gt;Node&lt;/span&gt;(&lt;span class="pl-s"&gt;"C"&lt;/span&gt;)]
&lt;span class="pl-s1"&gt;root&lt;/span&gt;.&lt;span class="pl-c1"&gt;children&lt;/span&gt;[&lt;span class="pl-c1"&gt;0&lt;/span&gt;].&lt;span class="pl-c1"&gt;children&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; [&lt;span class="pl-en"&gt;Node&lt;/span&gt;(&lt;span class="pl-s"&gt;"D"&lt;/span&gt;), &lt;span class="pl-en"&gt;Node&lt;/span&gt;(&lt;span class="pl-s"&gt;"E"&lt;/span&gt;)]
&lt;span class="pl-s1"&gt;root&lt;/span&gt;.&lt;span class="pl-c1"&gt;children&lt;/span&gt;[&lt;span class="pl-c1"&gt;1&lt;/span&gt;].&lt;span class="pl-c1"&gt;children&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; [&lt;span class="pl-en"&gt;Node&lt;/span&gt;(&lt;span class="pl-s"&gt;"F"&lt;/span&gt;)]

&lt;span class="pl-en"&gt;traverse_tree&lt;/span&gt;(&lt;span class="pl-s1"&gt;root&lt;/span&gt;)  &lt;span class="pl-c"&gt;# Output: A, B, D, E, C, F&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;In this example, we define a &lt;code&gt;Node&lt;/code&gt; class to represent each node in the tree, with a &lt;code&gt;value&lt;/code&gt; attribute and a list of &lt;code&gt;children&lt;/code&gt;. The &lt;code&gt;traverse_tree&lt;/code&gt; function recursively visits each node in the tree, printing its value.&lt;/p&gt;
&lt;p&gt;This is a basic example of a tree traversal, and there are many variations and optimizations depending on the specific use case.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That generation ran at an impressive &lt;a href="https://gist.github.com/simonw/85ce6a0faf1a58a243f9e5f4b2daeb3d?permalink_comment_id=5440701#gistcomment-5440701"&gt;152 tokens per second&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;That command was a bit of a mouthful, so let's assign an alias to the model:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm aliases &lt;span class="pl-c1"&gt;set&lt;/span&gt; l32 mlx-community/Llama-3.2-3B-Instruct-4bit&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now we can use that shorter alias instead:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m l32 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;a joke about a haggis buying a car&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;(The &lt;a href="https://gist.github.com/simonw/af322b399ce448f766d9349d9dc1cb5d#response"&gt;joke&lt;/a&gt; isn't very good.)&lt;/p&gt;
&lt;p&gt;As with other models supported by LLM, you can also pipe things to it. Here's how to get it to explain a piece of Python code (in this case &lt;a href="https://github.com/simonw/llm-mlx/blob/0.2/llm_mlx.py"&gt;itself&lt;/a&gt;):&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;cat llm_mlx.py &lt;span class="pl-k"&gt;|&lt;/span&gt; llm -m l32 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;explain this code&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The response started like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This code is a part of a larger project that uses the Hugging Face Transformers library to create a text-to-text conversational AI model. The code defines a custom model class &lt;code&gt;MlxModel&lt;/code&gt; and a set of command-line interface (CLI) commands for working with MLX models. [...]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/a577c820b18591d4825b70679cee3f9c"&gt;the rest of the response&lt;/a&gt;. I'm pretty amazed at how well it did for a tiny 1.8GB model!&lt;/p&gt;

&lt;p&gt;This plugin can only run models that have been converted to work with Apple's MLX framework. Thankfully the &lt;a href="https://huggingface.co/mlx-community"&gt;mlx-community&lt;/a&gt; organization on Hugging Face has published over 1,000 of these. A few I've tried successfully:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;mlx-community/Qwen2.5-0.5B-Instruct-4bit&lt;/code&gt; - &lt;a href="https://huggingface.co/mlx-community/Qwen2.5-0.5B-Instruct-4bit"&gt;278 MB&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mlx-community/Mistral-7B-Instruct-v0.3-4bit&lt;/code&gt; - &lt;a href="https://huggingface.co/mlx-community/Mistral-7B-Instruct-v0.3-4bit"&gt;4.08 GB&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mlx-community/Mistral-Small-24B-Instruct-2501-4bit&lt;/code&gt; - &lt;a href="https://huggingface.co/mlx-community/Mistral-Small-24B-Instruct-2501-4bit"&gt;13.26 GB&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit&lt;/code&gt; - &lt;a href="https://huggingface.co/mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit"&gt;18.5 GB&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mlx-community/Llama-3.3-70B-Instruct-4bit&lt;/code&gt; - &lt;a href="https://huggingface.co/mlx-community/Llama-3.3-70B-Instruct-4bit"&gt;40 GB&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Mistral-Small-24B really does feel GPT-4 quality despite only needing around 12GB of RAM to run - so it's a good default model if you want to leave space to run other apps.&lt;/p&gt;

&lt;p&gt;DeepSeek-R1-Distill-Qwen-32B-4bit is a great model if you want to try out R1 style "reasoning" which outputs a &lt;code&gt;&amp;lt;think&amp;gt;...&amp;lt;/think&amp;gt;...&lt;/code&gt; block at the start of each response.&lt;/p&gt;

&lt;p&gt;That last Llama 70B one needs a machine with 64GB of RAM to run but is &lt;em&gt;very&lt;/em&gt; capable - I wrote about trying out the Ollama version of Llama 3.3 70B in &lt;a href="https://simonwillison.net/2024/Dec/9/llama-33-70b/"&gt;I can now run a GPT-4 class model on my laptop&lt;/a&gt; back in December.&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm mlx download-model mlx-community/Llama-3.3-70B-Instruct-4bit&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/mlx-download-llama-33.jpg" alt="Screenshot of the download-model command running against the Llama 3.3 70B model, it shows fetching 13 files and displays progress bars for 8 different safetensors files each weighing between 2.6 and 3.3GB." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;For larger models like that I recommend using the &lt;code&gt;llm chat&lt;/code&gt; command as that ensures they stay in memory for multiple rounds of the conversation:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm chat -m mlx-community/Llama-3.3-70B-Instruct-4bit&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Activity Monitor shows the process using 37.5GB of RAM while the model is running.&lt;/p&gt;
&lt;p&gt;Some of these jokes are actually pretty good!&lt;/p&gt;
&lt;blockquote&gt;
&lt;pre&gt;&lt;code&gt;Chatting with mlx-community/Llama-3.3-70B-Instruct-4bit
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
&amp;gt; hi
How's your day going so far? Is there something I can help you with or would you like to chat?
&amp;gt; jokes about weird animals please
Here are a few:

1. Why did the axolotl go to therapy? Because it was struggling to come to terms with its identity crisis... and also because it had a lot of "regrowing" to do!
2. What did the aye-aye say to the other aye-aye when they were out on a date? "You've got a finger on the pulse of my heart!"
3. Why did the naked mole rat go to the party? Because it was a "hair-raising" experience and it wanted to get to the "root" of the fun!
4. What do you call a group of flamingos playing instruments? The Pink Floyd... of the avian world, of course!
5. Why did the blobfish go to the doctor? It had a "gut" feeling something was wrong... and also because it was feeling a little "deflated"!

Want more? I've got a whole zoo of 'em!
&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;p&gt;That ran at &lt;a href="https://gist.github.com/simonw/ec99b6cf65ae61256a34f3ac269951dc"&gt;8.8 tokens per second&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As with every LLM plugin, all prompts and responses are logged to &lt;a href="https://llm.datasette.io/en/stable/logging.html"&gt;a SQLite database&lt;/a&gt; that can be accessed using the &lt;code&gt;llm logs&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;You can use LLM models from your Python code as well. The README includes &lt;a href="https://github.com/simonw/llm-mlx/blob/0.2.1/README.md#using-models-from-python"&gt;notes on how to do this&lt;/a&gt;, as does the &lt;a href="https://llm.datasette.io/en/stable/python-api.html"&gt;LLM Python API documentation&lt;/a&gt;. TLDR:&lt;/p&gt;

&lt;pre&gt;&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;llm&lt;/span&gt;
&lt;span class="pl-s1"&gt;model&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;llm&lt;/span&gt;.&lt;span class="pl-c1"&gt;get_model&lt;/span&gt;(&lt;span class="pl-s"&gt;"mlx-community/Llama-3.2-3B-Instruct-4bit"&lt;/span&gt;)
&lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s1"&gt;model&lt;/span&gt;.&lt;span class="pl-c1"&gt;prompt&lt;/span&gt;(&lt;span class="pl-s"&gt;"hi"&lt;/span&gt;).&lt;span class="pl-c1"&gt;text&lt;/span&gt;())&lt;/pre&gt;

&lt;h4 id="mlx-and-mlx-lm"&gt;MLX and mlx-lm&lt;/h4&gt;
&lt;p&gt;The secret sauce behind this new plugin is Apple's &lt;a href="https://github.com/ml-explore/mlx"&gt;MLX library&lt;/a&gt;, which as been improving at an extraordinary pace over the past year - it first launched on GitHub just in November of 2023.&lt;/p&gt;
&lt;p&gt;The core library is C++, but the MLX team at Apple also maintain a &lt;a href="https://github.com/ml-explore/mlx-examples"&gt;mlx-examples&lt;/a&gt; repo with a wealth of Python examples, including the &lt;a href="https://github.com/ml-explore/mlx-examples/blob/main/llms/README.md"&gt;mlx-lm&lt;/a&gt; and &lt;a href="https://github.com/ml-explore/mlx-examples/tree/main/whisper"&gt;mlx-whisper&lt;/a&gt; packages.&lt;/p&gt;
&lt;p&gt;The community around it is growing too. I'm particularly impressed by Prince Canuma's &lt;a href="https://github.com/Blaizzy/mlx-vlm"&gt;mlx-vlm&lt;/a&gt; library for accessing multi-modal vision LLMs - I'd love to get that working as an LLM plugin once I figure out &lt;a href="https://github.com/Blaizzy/mlx-vlm/issues/68"&gt;how to use it for conversations&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I've used MLX for &lt;a href="https://simonwillison.net/tags/mlx/"&gt;a few experiments&lt;/a&gt; in the past, but &lt;a href="https://twitter.com/awnihannun/status/1890524526821126620"&gt;this tweet&lt;/a&gt; from MLX core developer Awni Hannun finally convinced me to wrap it up as an LLM plugin:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the latest MLX small LLMs are a lot faster.&lt;/p&gt;
&lt;p&gt;On M4 Max 4-bit Qwen 0.5B generates 1k tokens at a whopping 510 toks/sec. And runs at over 150 tok/sec on iPhone 16 pro.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is &lt;em&gt;really&lt;/em&gt; good software. This small team at Apple appear to be almost single-handedly giving NVIDIA's CUDA a run for their money!&lt;/p&gt;
&lt;h4 id="building-the-plugin"&gt;Building the plugin&lt;/h4&gt;
&lt;p&gt;The &lt;a href="https://github.com/simonw/llm-mlx"&gt;llm-mlx&lt;/a&gt; plugin came together pretty quickly. The &lt;a href="https://github.com/simonw/llm-mlx/blob/0.1/llm_mlx.py"&gt;first version&lt;/a&gt; was ~100 lines of Python, much of it repurposed from my existing &lt;a href="https://github.com/simonw/llm-gguf"&gt;llm-gguf&lt;/a&gt; plugin.&lt;/p&gt;
&lt;p&gt;The hardest problem was figuring out how to hide the Hugging Face progress bars that displayed every time it attempted to access a model!&lt;/p&gt;
&lt;p&gt;I eventually found the &lt;code&gt;from huggingface_hub.utils import disable_progress_bars&lt;/code&gt; utility function by piping library code through Gemini 2.0.&lt;/p&gt;
&lt;p&gt;I then added &lt;a href="https://github.com/simonw/llm-mlx/issues/3"&gt;model options support&lt;/a&gt; allowing you to pass options like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m l32 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;a greeting&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; -o temperature 1.0 -o seed 2&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;So far using a fixed seed &lt;a href="https://github.com/simonw/llm-mlx/issues/3#issuecomment-2661025591"&gt;appears to produce repeatable results&lt;/a&gt;, which is exciting for writing more predictable tests in the future.&lt;/p&gt;

&lt;p&gt;For the &lt;a href="https://github.com/simonw/llm-mlx/blob/0.2/tests/test_mlx.py"&gt;automated tests&lt;/a&gt; that run in GitHub Actions I decided to use a small model - I went with the tiny 75MB &lt;a href="https://huggingface.co/mlx-community/SmolLM-135M-Instruct-4bit/tree/main"&gt;mlx-community/SmolLM-135M-Instruct-4bit&lt;/a&gt; (explored &lt;a href="https://simonwillison.net/2025/Feb/7/pip-install-llm-smollm2/"&gt;previously&lt;/a&gt;). I configured GitHub Actions to cache the model in between CI runs by adding the following YAML to my &lt;a href="https://github.com/simonw/llm-mlx/blob/0.2/.github/workflows/test.yml"&gt;.github/workflows/test.yml&lt;/a&gt; file:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Cache models&lt;/span&gt;
      &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/cache@v4&lt;/span&gt;
      &lt;span class="pl-ent"&gt;with&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;path&lt;/span&gt;: &lt;span class="pl-s"&gt;~/.cache/huggingface&lt;/span&gt;
        &lt;span class="pl-ent"&gt;key&lt;/span&gt;: &lt;span class="pl-s"&gt;${{ runner.os }}-huggingface-&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/apple"&gt;apple&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlx"&gt;mlx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prince-canuma"&gt;prince-canuma&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="apple"/><category term="plugins"/><category term="projects"/><category term="ai"/><category term="github-actions"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="llm"/><category term="qwen"/><category term="mlx"/><category term="prince-canuma"/><category term="ai-in-china"/></entry><entry><title>Trying out QvQ - Qwen's new visual reasoning model</title><link href="https://simonwillison.net/2024/Dec/24/qvq/#atom-tag" rel="alternate"/><published>2024-12-24T20:49:08+00:00</published><updated>2024-12-24T20:49:08+00:00</updated><id>https://simonwillison.net/2024/Dec/24/qvq/#atom-tag</id><summary type="html">
    &lt;p&gt;I thought we were done for major model releases in 2024, but apparently not: Alibaba's Qwen team just dropped the &lt;s&gt;Apache 2.0 licensed&lt;/s&gt; Qwen licensed (&lt;a href="https://simonwillison.net/2024/Dec/24/qvq/#the-license-changed"&gt;the license changed&lt;/a&gt;)  QvQ-72B-Preview, "an experimental research model focusing on enhancing visual reasoning capabilities".&lt;/p&gt;
&lt;p&gt;Their blog post is titled &lt;a href="https://qwenlm.github.io/blog/qvq-72b-preview/"&gt;QvQ: To See the World with Wisdom&lt;/a&gt; - similar flowery language to their QwQ announcement &lt;a href="https://qwenlm.github.io/blog/qwq-32b-preview/"&gt;QwQ: Reflect Deeply on the Boundaries of the Unknown&lt;/a&gt; a few weeks ago in November.&lt;/p&gt;
&lt;p&gt;It's a vision-focused follow-up to QwQ, which I &lt;a href="https://simonwillison.net/2024/Nov/27/qwq/"&gt;wrote about previousy&lt;/a&gt;. QwQ is an impressive openly licensed inference-scaling model: give it a prompt and it will think out loud over many tokens while trying to derive a good answer, similar to OpenAI's o1 and o3 models.&lt;/p&gt;
&lt;p&gt;The new QvQ adds vision to the mix. You can try it out &lt;a href="https://huggingface.co/spaces/Qwen/QVQ-72B-preview"&gt;on Hugging Face Spaces&lt;/a&gt; - it accepts an image and a single prompt and then streams out a &lt;em&gt;very&lt;/em&gt; long response where it thinks through the problem you have posed it. There's no option to send a follow-up prompt.&lt;/p&gt;
&lt;p&gt;I've tried it out with a bunch of things, with mixed results - but it's really fun seeing how it works through a problem.&lt;/p&gt;
&lt;p&gt;My most successful prompt was "Count the pelicans" with this image:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/count-pelicans-easy.jpg" alt="A photograph of four glorious young California Brown Pelicans perched on a dock. The back end of another pelican is visible at the front of the photo." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/6c296f4b9323736dc77978447b6368fc#response"&gt;the full response&lt;/a&gt;. Some highlights:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;So I've got this picture of some pelicans, and I need to count them. Let me see... First off, I know that pelicans are those big birds with the massive beaks, right? [...]&lt;/p&gt;
&lt;p&gt;Let me try counting again, more carefully this time. Starting from the left:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The first pelican is fully visible, standing at the front.&lt;/li&gt;
&lt;li&gt;Behind it, slightly to the right, is the second pelican.&lt;/li&gt;
&lt;li&gt;Then, further back, there's a third pelican.&lt;/li&gt;
&lt;li&gt;And next to that one, there's a fourth pelican.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So, that's four pelicans in total. But, I notice that on the far right of the image, there's part of another bird, maybe a pelican's back or tail feathers, but it's not fully in the frame, so I don't think I should count it as a whole pelican.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Something about the tone of voice it uses, "First off, I know that pelicans are those big birds with the massive beaks, right?", tickles me.&lt;/p&gt;
&lt;p&gt;I also tried this on a &lt;a href="https://gist.github.com/simonw/6c296f4b9323736dc77978447b6368fc#image-2"&gt;much harder image&lt;/a&gt; - it counted 17 which looks about right to me but honestly I have trouble with that image myself.&lt;/p&gt;
&lt;p&gt;I fed in &lt;a href="https://gist.github.com/simonw/6c296f4b9323736dc77978447b6368fc#arc-agi-puzzle-0d87d2a6"&gt;one of the ARC-AGI puzzles&lt;/a&gt; that o3 had failed at.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A puzzle with colored squares, where drawing a line between the single blue squares and turning any intersected rectangles blue is clearly the solution." src="https://static.simonwillison.net/static/2024/arc-agi-task-0d87d2a6.png" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;It produced &lt;a href="https://gist.github.com/simonw/6c296f4b9323736dc77978447b6368fc#response-1"&gt;a very lengthy chain of thought&lt;/a&gt; that was almost entirely incorrect, but had some interesting "ideas" in it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[...] Let me try to think in terms of cellular automata rules, like Conway's Game of Life.&lt;/p&gt;
&lt;p&gt;In Game of Life, each cell's state in the next generation is determined by its current state and the states of its eight neighbors.&lt;/p&gt;
&lt;p&gt;Perhaps a similar rule applies here.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Finally, I asked it to "Estimate the height of the dinosaur" against this image (which, as it correctly noted, is actually an inflatable dragon):&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/inflatable-dragon.jpg" alt="A large inflatable Christmas decoration of a reindeer riding a dragon - both reindeer and dragon are wearing Christmas hats" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I'm not sure this task is even possible given the lack of items in the frame of comparable known height. It &lt;a href="https://gist.github.com/simonw/6c296f4b9323736dc77978447b6368fc#response-3"&gt;had a good go at it&lt;/a&gt; though! I particularly enjoyed how it began:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;So I've got this inflatable Christmas decoration, and it's a dragon! But not just any dragon—it's a green dragon wearing a Santa hat and a red and white scarf. And get this, there's a reindeer on its back, also wearing a Santa hat and holding onto the dragon's wings. It's pretty festive and colorful, with lots of vibrant greens, reds, and yellows.&lt;/p&gt;
&lt;p&gt;The dragon has these big, friendly eyes and a smiling face, which makes it look really cheerful. Its wings are spread out, and it has white claws on its feet. The reindeer is brown with antlers and seems to be enjoying the ride on the dragon's back.&lt;/p&gt;
&lt;p&gt;Now, the question is, how tall is this inflatable dragon? I need to estimate its height. Let me see. I'm standing next to it, and I'm about 5 feet 10 inches tall. The dragon is definitely taller than me. Maybe around 8 or 9 feet high? But I should try to be more precise.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I wonder how it decided that its own height was 5 feet 10 inches!&lt;/p&gt;
&lt;h4 id="running-qvq-locally"&gt;Running QvQ locally&lt;/h4&gt;
&lt;p&gt;All of my experiments so far have used the &lt;a href="https://huggingface.co/spaces/Qwen/QVQ-72B-preview"&gt;hosted Hugging Face demo&lt;/a&gt;. I'm hoping to try it out on my own laptop soon - I think it should just fit in 64GB of M2, maybe with a smaller quantization.&lt;/p&gt;
&lt;p&gt;Right now the options are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/Qwen/QVQ-72B-Preview"&gt;Qwen/QVQ-72B-Preview&lt;/a&gt; on Hugging Face has the GPU model weights, for use with Hugging Face Transformers and the &lt;a href="https://pypi.org/project/qwen-vl-utils/"&gt;qwen-vl-utils&lt;/a&gt; Python package.&lt;/li&gt;
&lt;li&gt;Prince Canuma &lt;a href="https://twitter.com/Prince_Canuma/status/1871687356815716672"&gt;converted the model for Apple's MLX framework&lt;/a&gt; as &lt;a href="https://huggingface.co/collections/mlx-community/qvq-72b-preview-676b345a6f93172ba980c0d5"&gt;mlx-community/qvq-72b-preview&lt;/a&gt; and got it working with his excellent &lt;a href="https://github.com/Blaizzy/mlx-vlm"&gt;mlx-vlm&lt;/a&gt; package.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As a happy user of Ollama's &lt;a href="https://ollama.com/library/qwq"&gt;qwq port&lt;/a&gt; I'm hoping they add a QvQ release at some point soon as well.&lt;/p&gt;
&lt;h4 id="with-mlx-vlm"&gt;Running QvQ on my Mac using mlx-vlm&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: I successfully ran QvQ on my 64GB M2 MacBook Pro using Prince Canuma's &lt;code&gt;mlx-vlm&lt;/code&gt; via &lt;code&gt;uv run &lt;/code&gt; with the following command:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;uv run --with &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;numpy&amp;lt;2.0&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; --with mlx-vlm python \
  -m mlx_vlm.generate \
    --model mlx-community/QVQ-72B-Preview-4bit \
    --max-tokens 10000 \
    --temp 0.0 \
    --prompt &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;describe this&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
    --image pelicans-on-bicycles-veo2.jpg&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;My image input here was taken from my experiment with Google's &lt;a href="https://simonwillison.net/2024/Dec/16/veo-2/"&gt;Veo 2&lt;/a&gt; video generation model:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/pelicans-on-bicycles-veo2.jpg" alt="A screenshot of the Google VideoFX video generation tool" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Here's the full output from running that command, after it first downloaded 38GB to &lt;code&gt;~/.cache/huggingface/hub/models--mlx-community--QVQ-72B-Preview-4bi&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;==========
Image: ['pelicans-on-bicycles-veo2.jpg'] 

Prompt: &amp;lt;|im_start|&amp;gt;system
You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.&amp;lt;|im_end|&amp;gt;
&amp;lt;|im_start|&amp;gt;user
describe this&amp;lt;|vision_start|&amp;gt;&amp;lt;|image_pad|&amp;gt;&amp;lt;|vision_end|&amp;gt;&amp;lt;|im_end|&amp;gt;
&amp;lt;|im_start|&amp;gt;assistant

Alright, I've got this "Text to Video" tool to explore. It seems pretty advanced, allowing me to turn text descriptions into actual videos. The interface looks user-friendly, with a dark theme that's easy on the eyes. On the left side, there's a panel where I can input my text prompt. It already has an example filled in: "A pelican riding a bicycle along a coastal path overlooking a harbor."

That sounds interesting! I can see four different video thumbnails on the right side, each showing variations of this scene. One of them even has the pelican wearing a helmet, which adds a humorous touch. Another shows the pelican standing next to a bicycle, almost as if it's ready to hop on and start pedaling.

Below the text input, there are some style options like "photorealistic," "35mm film," "minimal," "sketchy," "handmade," and "abstract." These probably let me choose the visual style of the video. For now, I'll stick with the default settings.

At the bottom of the left panel, there are two buttons: "Start over" and "Create videos." I guess "Create videos" is the one I need to press to generate the videos based on my text prompt.

Let me think about this. I want to see what kind of videos this tool can produce based on my description. The example prompt is already filled in, so maybe I should just go ahead and click "Create videos" to see what happens.

Wait a minute, the prompt mentions a coastal path overlooking a harbor. In the thumbnails, I can see a background with water and boats, which matches the description. It's cool how the tool has interpreted the text into visual elements.

I should also pay attention to the "Visual Prompt" section, which says "View only." That probably means I can't edit it directly, but I can view the generated videos based on this prompt.

Hmm, maybe I should try changing the prompt to see different results. For example, instead of a pelican riding a bicycle, maybe I could describe a different scene. But for now, I'll stick with the given example to understand how it works.

After clicking "Create videos," I expect to see the videos being generated. There might be a loading process, depending on the complexity of the prompt and the tool's processing speed.

Once the videos are ready, I can preview them to see how well they match my description. If I'm not satisfied, I can always go back and adjust the prompt or try different style options.

Overall, this seems like a powerful tool for creating custom videos based on textual descriptions. It could be really useful for storytelling, presentations, or even just for fun, experimenting with different scenes and styles.

I think I'll spend some time exploring the different style options to see how they affect the final videos. For instance, selecting "35mm film" might give the videos a vintage, cinematic look, while "abstract" could produce more artistic and less realistic visuals.

Also, I wonder if there's a way to customize more specific aspects, like the camera angle, lighting, or movement. Maybe after generating the initial videos, there are additional editing options available.

In summary, I'm excited to use this Text to Video tool and see what kind of creative possibilities it opens up. With a bit of practice, I could probably create some impressive visual content just by describing what I want to see.

**Final Answer**

To proceed with generating videos based on the provided text prompt, I need to click on the "Create videos" button. This action will initiate the process of converting the textual description into visual content, resulting in multiple video thumbnails that can be previewed for further use or adjustments.
==========
Prompt: 0.870 tokens-per-sec
Generation: 7.694 tokens-per-sec
&lt;/code&gt;&lt;/pre&gt;

&lt;h4 id="the-license-changed"&gt;The license changed from Apache 2.0 to Qwen&lt;/h4&gt;
&lt;p&gt;When I wrote this post yesterday the LICENSE file in the Hugging Face repository was Apache 2.0. Just after midnight UTC on 25th December &lt;a href="https://huggingface.co/Qwen/QVQ-72B-Preview/commit/53b19b90d67220c896e868a809ef1b93d0c8dab8"&gt;this commit&lt;/a&gt; landed updating the  &lt;code&gt;QVQ-72B-Preview&lt;/code&gt; license file to the &lt;a href="https://huggingface.co/Qwen/QVQ-72B-Preview/blob/main/LICENSE"&gt;Qwen license&lt;/a&gt; instead.&lt;/p&gt;
&lt;p&gt;This looks to me like they were correcting a mistake, not changing their policy. The &lt;a href="https://huggingface.co/Qwen/QVQ-72B-Preview/blob/main/README.md"&gt;README.md&lt;/a&gt; for that repository has this block of YAML:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;license: other
license_name: qwen
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And &lt;a href="https://huggingface.co/Qwen/QVQ-72B-Preview/commits/main/README.md"&gt;commits to that README&lt;/a&gt; at one point linked to the &lt;a href="https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE"&gt;Qwen2.5-72B-Instruct copy of the Qwen license&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The QwQ model repository &lt;a href="https://huggingface.co/Qwen/QwQ-32B-Preview/blob/main/LICENSE"&gt;continues to list Apache 2.0&lt;/a&gt;, which matches the YAML in &lt;a href="https://huggingface.co/Qwen/QwQ-32B-Preview/blob/main/README.md"&gt;its README&lt;/a&gt; as well.&lt;/p&gt;
&lt;p&gt;So it looks to me like the intention is for QvQ and Qwen2.5-72B-Instruct to be Qwen licensed, while QwQ is Apache 2.0.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlx"&gt;mlx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prince-canuma"&gt;prince-canuma&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="python"/><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="hugging-face"/><category term="vision-llms"/><category term="uv"/><category term="qwen"/><category term="mlx"/><category term="llm-reasoning"/><category term="llm-release"/><category term="prince-canuma"/><category term="ai-in-china"/></entry><entry><title>SmolVLM - small yet mighty Vision Language Model</title><link href="https://simonwillison.net/2024/Nov/28/smolvlm/#atom-tag" rel="alternate"/><published>2024-11-28T20:29:27+00:00</published><updated>2024-11-28T20:29:27+00:00</updated><id>https://simonwillison.net/2024/Nov/28/smolvlm/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/blog/smolvlm"&gt;SmolVLM - small yet mighty Vision Language Model&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I've been having fun playing with this new vision model from the Hugging Face team behind &lt;a href="https://simonwillison.net/2024/Nov/2/smollm2/"&gt;SmolLM&lt;/a&gt;. They describe it as:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[...] a 2B VLM, SOTA for its memory footprint. SmolVLM is small, fast, memory-efficient, and fully open-source. All model checkpoints, VLM datasets, training recipes and tools are released under the Apache 2.0 license.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've tried it in a few flavours but my favourite so far is the &lt;a href="https://github.com/Blaizzy/mlx-vlm"&gt;mlx-vlm&lt;/a&gt; approach, via &lt;code&gt;mlx-vlm&lt;/code&gt; author &lt;a href="https://twitter.com/Prince_Canuma/status/1862168514842280401"&gt;Prince Canuma&lt;/a&gt;. Here's the &lt;code&gt;uv&lt;/code&gt; recipe I'm using to run it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uv run \
  --with mlx-vlm \
  --with torch \
  python -m mlx_vlm.generate \
    --model mlx-community/SmolVLM-Instruct-bf16 \
    --max-tokens 500 \
    --temp 0.5 \
    --prompt "Describe this image in detail" \
    --image IMG_4414.JPG
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you run into an error using Python 3.13 (torch compatibility) try &lt;code&gt;uv run --python 3.11&lt;/code&gt; instead.&lt;/p&gt;
&lt;p&gt;This one-liner installs the necessary dependencies, downloads the model (about 4.2GB, saved to &lt;code&gt;~/.cache/huggingface/hub/models--mlx-community--SmolVLM-Instruct-bf16&lt;/code&gt;) and executes the prompt and displays the result.&lt;/p&gt;
&lt;p&gt;I ran that against &lt;a href="https://static.simonwillison.net/static/2024/IMG_4414.JPG"&gt;this Pelican photo&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A glorious pelican on some rocks, two other pelicans are visible plus some other birds" src="https://static.simonwillison.net/static/2024/IMG_4414.JPG" /&gt;&lt;/p&gt;
&lt;p&gt;The model replied:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the foreground of this photograph, a pelican is perched on a pile of rocks. The pelican’s wings are spread out, and its beak is open. There is a small bird standing on the rocks in front of the pelican. The bird has its head cocked to one side, and it seems to be looking at the pelican. To the left of the pelican is another bird, and behind the pelican are some other birds. The rocks in the background of the image are gray, and they are covered with a variety of textures. The rocks in the background appear to be wet from either rain or sea spray.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There are a few spatial mistakes in that description but the vibes are generally in the right direction.&lt;/p&gt;
&lt;p&gt;On my 64GB M2 MacBook pro it read the prompt at 7.831 tokens/second and generated that response at an impressive 74.765 tokens/second.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlx"&gt;mlx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/smollm"&gt;smollm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prince-canuma"&gt;prince-canuma&lt;/a&gt;&lt;/p&gt;



</summary><category term="python"/><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="vision-llms"/><category term="uv"/><category term="mlx"/><category term="smollm"/><category term="llm-release"/><category term="prince-canuma"/></entry><entry><title>mlx-vlm</title><link href="https://simonwillison.net/2024/Sep/29/mlx-vlm/#atom-tag" rel="alternate"/><published>2024-09-29T21:38:46+00:00</published><updated>2024-09-29T21:38:46+00:00</updated><id>https://simonwillison.net/2024/Sep/29/mlx-vlm/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/Blaizzy/mlx-vlm"&gt;mlx-vlm&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The MLX ecosystem of libraries for running machine learning models on Apple Silicon continues to expand. Prince Canuma is actively developing this library for running vision models such as Qwen-2 VL and Pixtral and LLaVA using Python running on a Mac.&lt;/p&gt;
&lt;p&gt;I used &lt;a href="https://docs.astral.sh/uv/"&gt;uv&lt;/a&gt; to run it against &lt;a href="https://static.simonwillison.net/static/2024/django-roadmap.png"&gt;this image&lt;/a&gt; with this shell one-liner:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uv run --with mlx-vlm \
  python -m mlx_vlm.generate \
  --model Qwen/Qwen2-VL-2B-Instruct \
  --max-tokens 1000 \
  --temp 0.0 \
  --image https://static.simonwillison.net/static/2024/django-roadmap.png \
  --prompt "Describe image in detail, include all text"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;--image&lt;/code&gt; option works equally well with a URL or a path to a local file on disk.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Django roadmap diagram, described below" src="https://static.simonwillison.net/static/2024/django-roadmap.png" /&gt;&lt;/p&gt;
&lt;p&gt;This first downloaded 4.1GB to my &lt;code&gt;~/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct&lt;/code&gt; folder and then output &lt;a href="https://gist.github.com/simonw/9e02d425cacb902260ec1307e0671e17"&gt;this result&lt;/a&gt;, which starts:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The image is a horizontal timeline chart that represents the release dates of various software versions. The timeline is divided into years from 2023 to 2029, with each year represented by a vertical line. The chart includes a legend at the bottom, which distinguishes between different types of software versions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Legend&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mainstream Support:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;4.2 (2023)&lt;/li&gt;
&lt;li&gt;5.0 (2024)&lt;/li&gt;
&lt;li&gt;5.1 (2025)&lt;/li&gt;
&lt;li&gt;5.2 (2026)&lt;/li&gt;
&lt;li&gt;6.0 (2027) [...]&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://mastodon.social/@zubakskees/113221293869864076"&gt;Chris Zubak-Skees&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/apple"&gt;apple&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlx"&gt;mlx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prince-canuma"&gt;prince-canuma&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;



</summary><category term="apple"/><category term="python"/><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="vision-llms"/><category term="uv"/><category term="qwen"/><category term="mlx"/><category term="prince-canuma"/><category term="ai-in-china"/></entry></feed>