<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: llamafile</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/llamafile.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2024-08-19T20:08:59+00:00</updated><author><name>Simon Willison</name></author><entry><title>llamafile v0.8.13 (and whisperfile)</title><link href="https://simonwillison.net/2024/Aug/19/whisperfile/#atom-tag" rel="alternate"/><published>2024-08-19T20:08:59+00:00</published><updated>2024-08-19T20:08:59+00:00</updated><id>https://simonwillison.net/2024/Aug/19/whisperfile/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.8.13"&gt;llamafile v0.8.13 (and whisperfile)&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The latest release of &lt;a href="https://github.com/Mozilla-Ocho/llamafile"&gt;llamafile&lt;/a&gt; (&lt;a href="https://simonwillison.net/2023/Nov/29/llamafile/"&gt;previously&lt;/a&gt;) adds support for &lt;a href="https://blog.google/technology/developers/gemma-open-models/"&gt;Gemma 2B&lt;/a&gt; (pre-bundled &lt;a href="https://huggingface.co/jartine/gemma-2-27b-it-llamafile/tree/main"&gt;llamafiles available here&lt;/a&gt;), significant performance improvements and new support for the Whisper speech-to-text model, based on &lt;a href="https://github.com/ggerganov/whisper.cpp"&gt;whisper.cpp&lt;/a&gt;, Georgi Gerganov's C++ implementation of Whisper that pre-dates his work on &lt;code&gt;llama.cpp&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;I got &lt;code&gt;whisperfile&lt;/code&gt; working locally by first downloading the cross-platform executable attached to &lt;a href="https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.8.13"&gt;the GitHub release&lt;/a&gt; and then grabbing a &lt;code&gt;whisper-tiny.en-q5_1.bin&lt;/code&gt; model from Hugging Face:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;wget -O whisper-tiny.en-q5_1.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en-q5_1.bin
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then I ran &lt;code&gt;chmod 755 whisperfile-0.8.13&lt;/code&gt; and then executed it against an example &lt;code&gt;.wav&lt;/code&gt; file like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./whisperfile-0.8.13 -m whisper-tiny.en-q5_1.bin -f raven_poe_64kb.wav --no-prints
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;--no-prints&lt;/code&gt; option suppresses the debug output, so you just get text that looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[00:00:00.000 --&amp;gt; 00:00:12.000]   This is a LibraVox recording. All LibraVox recordings are in the public domain. For more information please visit LibraVox.org.
[00:00:12.000 --&amp;gt; 00:00:20.000]   Today's reading The Raven by Edgar Allan Poe, read by Chris Scurringe.
[00:00:20.000 --&amp;gt; 00:00:40.000]   Once upon a midnight dreary, while I pondered weak and weary, over many a quaint and curious volume of forgotten lore. While I nodded nearly napping, suddenly there came a tapping as of someone gently rapping, rapping at my chamber door.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are quite a few &lt;a href="https://github.com/Mozilla-Ocho/llamafile/issues/544#issuecomment-2297368432"&gt;undocumented options&lt;/a&gt; - to write out JSON to a file called &lt;code&gt;transcript.json&lt;/code&gt; (&lt;a href="https://gist.github.com/simonw/39173ac94e71cb01b749f9256a9408c4"&gt;example output&lt;/a&gt;):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./whisperfile-0.8.13 -m whisper-tiny.en-q5_1.bin -f /tmp/raven_poe_64kb.wav --no-prints --output-json --output-file transcript
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I had to convert my own audio recordings to 16kHz &lt;code&gt;.wav&lt;/code&gt; files in order to use them with &lt;code&gt;whisperfile&lt;/code&gt;. I used &lt;code&gt;ffmpeg&lt;/code&gt; to do this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ffmpeg -i runthrough-26-oct-2023.wav -ar 16000 /tmp/out.wav
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then I could transcribe that like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./whisperfile-0.8.13 -m whisper-tiny.en-q5_1.bin -f /tmp/out.wav --no-prints
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: &lt;a href="https://twitter.com/JustineTunney/status/1825676741593149949"&gt;Justine says&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I've just uploaded new whisperfiles &lt;a href="https://huggingface.co/Mozilla/whisperfile"&gt;to Hugging Face&lt;/a&gt; which use miniaudio.h to automatically resample and convert your mp3/ogg/flac/wav files to the appropriate format.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;With that &lt;code&gt;whisper-tiny&lt;/code&gt; model this took just 11s to transcribe a 10m41s audio file!&lt;/p&gt;
&lt;p&gt;I also tried the much larger Whisper Medium model - I chose to use the 539MB  &lt;code&gt;ggml-medium-q5_0.bin&lt;/code&gt; quantized version of that from &lt;a href="https://huggingface.co/ggerganov/whisper.cpp/tree/main"&gt;huggingface.co/ggerganov/whisper.cpp&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./whisperfile-0.8.13 -m ggml-medium-q5_0.bin -f out.wav --no-prints
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This time it took 1m49s, using 761% of CPU according to Activity Monitor.&lt;/p&gt;
&lt;p&gt;I tried adding &lt;code&gt;--gpu auto&lt;/code&gt; to exercise the GPU on my M2 Max MacBook Pro:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./whisperfile-0.8.13 -m ggml-medium-q5_0.bin -f out.wav --no-prints --gpu auto
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That used just 16.9% of CPU and 93% of GPU according to Activity Monitor, and finished in 1m08s. &lt;/p&gt;
&lt;p&gt;I tried this with the &lt;code&gt;tiny&lt;/code&gt; model too but the performance difference there was imperceptible.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/JustineTunney/status/1825551821857010143"&gt;@JustineTunney&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ffmpeg"&gt;ffmpeg&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/whisper"&gt;whisper&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llamafile"&gt;llamafile&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/justine-tunney"&gt;justine-tunney&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/speech-to-text"&gt;speech-to-text&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/georgi-gerganov"&gt;georgi-gerganov&lt;/a&gt;&lt;/p&gt;



</summary><category term="ffmpeg"/><category term="ai"/><category term="whisper"/><category term="local-llms"/><category term="llamafile"/><category term="justine-tunney"/><category term="speech-to-text"/><category term="georgi-gerganov"/></entry><entry><title>gemma-2-27b-it-llamafile</title><link href="https://simonwillison.net/2024/Jul/2/gemma-2-27b-it-llamafile/#atom-tag" rel="alternate"/><published>2024-07-02T22:38:06+00:00</published><updated>2024-07-02T22:38:06+00:00</updated><id>https://simonwillison.net/2024/Jul/2/gemma-2-27b-it-llamafile/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/jartine/gemma-2-27b-it-llamafile"&gt;gemma-2-27b-it-llamafile&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Justine Tunney shipped llamafile packages of Google's new openly licensed (though definitely not open source) &lt;a href="https://ai.google.dev/gemma"&gt;Gemma&lt;/a&gt; 2 27b model this morning.&lt;/p&gt;
&lt;p&gt;I downloaded the &lt;code&gt;gemma-2-27b-it.Q5_1.llamafile&lt;/code&gt; version (20.5GB) to my Mac, ran &lt;code&gt;chmod 755 gemma-2-27b-it.Q5_1.llamafile&lt;/code&gt; and then &lt;code&gt;./gemma-2-27b-it.Q5_1.llamafile&lt;/code&gt; and now I'm trying it out through the &lt;code&gt;llama.cpp&lt;/code&gt; default web UI in my browser. It works great.&lt;/p&gt;
&lt;p&gt;It's a &lt;em&gt;very&lt;/em&gt; capable model - currently sitting at position 12 on the &lt;a href="https://chat.lmsys.org/"&gt;LMSYS Arena&lt;/a&gt; making it the highest ranked open weights model - one position ahead of Llama-3-70b-Instruct and within striking distance of the GPT-4 class models.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/JustineTunney/status/1808165898743878108"&gt;@JustineTunney&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llamafile"&gt;llamafile&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/justine-tunney"&gt;justine-tunney&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama-cpp"&gt;llama-cpp&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gemma"&gt;gemma&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatbot-arena"&gt;chatbot-arena&lt;/a&gt;&lt;/p&gt;



</summary><category term="google"/><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="llamafile"/><category term="justine-tunney"/><category term="llama-cpp"/><category term="gemma"/><category term="chatbot-arena"/></entry><entry><title>Language models on the command-line</title><link href="https://simonwillison.net/2024/Jun/17/cli-language-models/#atom-tag" rel="alternate"/><published>2024-06-17T16:44:39+00:00</published><updated>2024-06-17T16:44:39+00:00</updated><id>https://simonwillison.net/2024/Jun/17/cli-language-models/#atom-tag</id><summary type="html">
    &lt;p&gt;I gave a talk about accessing Large Language Models from the command-line last week as part of the &lt;a href="https://maven.com/parlance-labs/fine-tuning"&gt;Mastering LLMs: A Conference For Developers &amp;amp; Data Scientists&lt;/a&gt; six week long online conference. The talk focused on my &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; Python command-line utility and ways you can use it (and &lt;a href="https://llm.datasette.io/en/stable/plugins/index.html"&gt;its plugins&lt;/a&gt;) to explore LLMs and use them for useful tasks.&lt;/p&gt;

&lt;p&gt;The talk was recorded and is available &lt;a href="https://www.youtube.com/watch?v=QUXQNi6jQ30"&gt;on YouTube&lt;/a&gt;. Here I've turned it into an &lt;a href="https://simonwillison.net/tags/annotatedtalks/"&gt;annotated presentation&lt;/a&gt;, with detailed notes and screenshots (there were no slides) to accompany the video.&lt;/p&gt;

&lt;iframe style="max-width: 100%" width="560" height="315" src="https://www.youtube-nocookie.com/embed/QUXQNi6jQ30" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen="allowfullscreen"&gt; &lt;/iframe&gt;

  &lt;p style="margin-top: 1em"&gt;&lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; is a tool I started building last year to help run LLM prompts directly from a command-line terminal. Instructions for installing it &lt;a href="https://llm.datasette.io/en/stable/setup.html"&gt;are here&lt;/a&gt; - you can use &lt;code&gt;pipx install llm&lt;/code&gt; or &lt;code&gt;pip install llm&lt;/code&gt; or &lt;code&gt;brew install llm&lt;/code&gt;.&lt;/p&gt;

&lt;div class="llm-slide" id="frame_000003.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_000003.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_000003.jpg" alt="The LLM website" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_000003.jpg"&gt;#&lt;/a&gt;

&lt;p&gt;Once installed you can use it with OpenAI models by running &lt;code&gt;llm keys set openai&lt;/code&gt; and pasting in your OpenAI key - or &lt;a href="https://llm.datasette.io/en/stable/plugins/directory.html#plugin-directory"&gt;install plugins&lt;/a&gt; to use models by other providers, including models you can run locally.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_000259.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_000259.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_000259.jpg" alt="$ llm &amp;#39;five great names for a pet pelican&amp;#39;
Sure, here are five great names for a pet pelican:
1. **Splash**
2. **Captain Beak**
3. **Seabreeze**
4. **Marina**
5. **Pelicano**
These names incorporate elements of the pelican&amp;#39;s natural habitat and distinctive features, adding a fun and fitting touch for your feathered friend!" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_000259.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Basic usage is very simple: once you've configured your OpenAI key, you can run prompts against their models like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm 'five great names for a pet pelican'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The output will stream to your terminal, or you can redirect it to a file like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm 'five great names for a pet pelican' &amp;gt; pelicans.txt
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_000341.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_000341.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_000341.jpg" alt="$ llm -c &amp;#39;now do walruses&amp;#39;
Of course, here are five fitting names for a pet walrus:
1. **Whiskers**
2. **Tusk**
3. **Gustav**
4. **Blubber**
5. **Wally**

$ llm -c &amp;#39;justify those!&amp;#39;

Certainly! Here&amp;#39;s why each name suits a pet walrus:
1. **Whiskers**
- **Justification:** Walruses have distinctive, prominent whiskers (vibrissae) that the
y use to detect food on the ocean floor. This name highlights one of their most characteristic features.
2. **Tusk**
- **Justification:** Walruses are known for their long, formidable tusks, which they use for various purposes" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_000341.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;You can use the &lt;code&gt;-c&lt;/code&gt; (for continue) option to send follow-up prompts as part of the same ongoing conversation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -c 'now do walruses'
# ...
llm -c justify those!'
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_000506.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_000506.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_000506.jpg" alt="Screenshot of the list of Remote APIs plugins in the LLM plugins directory." /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_000506.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;LLM supports additional models via plugins. These are listed in the &lt;a href="https://llm.datasette.io/en/stable/plugins/directory.html"&gt;LLM plugins directory&lt;/a&gt;, with dozens of plugins for both remote API-hosted models as well as models you can run directly on your own computer.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_000704.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_000704.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_000704.jpg" alt="$ llm models
Outputs a list of OpenAI models followed by several anthropic models.

$ llm -m claude-3-haiku-20240307 &amp;#39;say hi in spanish with a flourish&amp;#39;
¡Hola, mi amigo! 🌟" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_000704.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Here I'm using the &lt;a href="https://github.com/simonw/llm-claude-3"&gt;llm-claude-3&lt;/a&gt; plugin, which provides access to the Anthropic &lt;a href="https://www.anthropic.com/news/claude-3-family"&gt;Claude 3 family&lt;/a&gt; of models.&lt;/p&gt;
&lt;p&gt;I really like these models. Claude 3 Opus is about equivalent to GPT-4o in terms of quality. Claude 3 Haiku is both cheaper and better than GPT-3.5, and can handle 100,000 input tokens including images.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-bash"&gt;llm install llm-claude-3
llm keys set claude
# &amp;lt;Paste key here&amp;gt;
# Now list available models
llm models
# Then run a prompt
llm -m claude-3-haiku-20240307 'say hi in spanish with a flourish'
# Or use the haiku alias
llm -m haiku 'say hi in spanish with a flourish'
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_000818.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_000818.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_000818.jpg" alt="$ llm logs path
/Users/simon/Library/Application Support/io.datasette.llm/logs.db

$ datasette &amp;quot;$(llm logs path)&amp;quot;
...
Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_000818.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Every prompt and response run through the LLM tool is permanently logged to a SQLite database, &lt;a href="https://llm.datasette.io/en/stable/logging.html"&gt;as described here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This command shows the path to that database:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm logs path
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you install &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; you can use it to browse your SQLite database like this, using a terminal trick where the output of one command is passed to another (with double quotes to avoid any problems caused by the space in the directory name):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;datasette "$(llm logs path)"
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_000832.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_000832.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_000832.jpg" alt="Datasette showing the 2,434 responses I have logged as a table, with a search interface at the top." /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_000832.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Here's my searchable database of 2,434 responses I've logged from using LLM on my laptop, running in Datasette.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_000853.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_000853.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_000853.jpg" alt="Datasette screenshot showing a conversation, with 3 linked responses" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_000853.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Earlier we ran a prompt and then sent two follow-up prompts to it using the &lt;code&gt;llm -c&lt;/code&gt; option. Those are stored in the database as three responses that are part of the same conversation.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_000924.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_000924.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_000924.jpg" alt="This data as json, copyable, CSV (advanced)

Suggested facets: datetime_utc (date)

model &amp;gt;30

- gpt-3.5-turbo 383
- claude-3-opus-20240229 334
- gpt-4 233
- gpt-4-1106-preview 134
- claude-2 117
- mic-chat-Llama-2-7b-chat-hf-q 4f16 1 93
- mistral-7b-instruct-vo 78
- mic-chat-Llama-2-13b-chat-hf- 94f16 1 74
- claude-3-haiku-20240307 49
- gpt-3.5-turbo-instruct 49
- gpt-40 37
- mic-chat-Mistral-7B-Instruct-v 0.2-q3f16 1 35
- Phi-3-mini-4k-instruct 33
- gemini-pro 33
- gemini-1.5-pro-latest 32
- gguf 28
- gpt-4-turbo-preview 27
- gpt-3.5-turbo-16k 26
- mistral-medium 26
- llama-2-7b-chat.ggmlv3.g8 0 2
- mistral-tiny 23
- Meta-Llama-3-8B-Instruct 21
- llamafile 21
- llama-2-13b.Q8 0 20
- orca-mini-3b 19
- command-r 14
- llama-2-7b.ggmlv3.8 0 14
- mistral-small 14
- mistral-7b-instruct-v0.1.08 0 1 3
- claude 12" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_000924.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Datasette supports facets, which provide a quick overview of unique value counts within the data. I've used GPT-3.5 turbo 383 times, Claude 3 Opus 334 times and a whole bunch of other models.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_000940.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_000940.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_000940.jpg" alt="6 rows where search matches &amp;quot;pelican&amp;quot; and model = &amp;quot;claude-3-opus-20240229&amp;quot; sorted by id descending" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_000940.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Datasette provides search and filtering too - here are the prompts I've run against Claude 3 Opus that match "pelican".&lt;/p&gt;
&lt;p&gt;I have an experimental feature that lets me pass images to some models using the &lt;code&gt;-i filename.png&lt;/code&gt; option. More on that &lt;a href="https://github.com/simonw/llm/issues/331"&gt;in this issue&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/llm-cmd"&gt;llm-cmd&lt;/a&gt; is an example of a plugin that adds an extra sub-command to LLM - in this case one that takes a description of a task and turns that into a pre-populated shell command. I wrote more about that in &lt;a href="https://simonwillison.net/2024/Mar/26/llm-cmd/"&gt;llm cmd undo last git commit—a new plugin for LLM&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_001335.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_001335.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_001335.jpg" alt="Screenshot of the GPT4All website: A free-to-use, locally running, privacy-aware chatbot. No GPU or internet required." /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_001335.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;There are plenty of ways to run local models using LLM. One of my favourite is with the &lt;a href="https://github.com/simonw/llm-gpt4all"&gt;llm-gpt4all&lt;/a&gt; plugin, which builds on top of the excellent &lt;a href="https://gpt4all.io/index.html"&gt;GPT4All&lt;/a&gt; Python library by Nomic AI.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_001450.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_001450.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_001450.jpg" alt="$ llm -m mistral-7b-instruct-v@ &amp;#39;five great names for a pet seagull, with explanations&amp;#39; =3
1. &amp;quot;Sally&amp;quot; - This name is perfect for a pet seagull because it&amp;#39;s short and easy to remember,
yet still has a charming and friendly vibe. It also evokes images of a playful and adventuro
us bird that loves to explore the world around her.
2. &amp;quot;Captain Jack&amp;quot; - For any seafaring seagull who loves adventure and exploration, this name
is perfect. Captain Jack suggests a sense of independence and leadership, which could be fitt
ing for a bird who enjoys flying high and exploring new places.
3. &amp;quot;Scarlett&amp;quot; - This name is bold and eye-catching, much like the vibrant feathers of a seagull. Scarlett also has a fiery energy that could reflect the spirit of your adventurous pet.
4. &amp;quot;Nautica&amp;quot; - Nautica means &amp;quot;sailor&amp;quot; in Latin, making it an excellent choice for a seagull

Activity Monitor is also visible, showing a Python process using 350MB of RAM." /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_001450.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;To install that plugin:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm install llm-gpt4all
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then &lt;code&gt;llm models&lt;/code&gt; to list the new models. Each model will be downloaded the first time you try running a prompt through it.&lt;/p&gt;
&lt;p&gt;I used this to run Mistral-7B Instruct - an extremely high quality small (~4GB) model:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m mistral-7b-instruct-v0 'five great names for a pet seagull, with explanations'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can run Activity Monitory to see the resources the model is using.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_001544.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_001544.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_001544.jpg" alt="$ llm chat -m mistral-7b-instruct-v0
Chatting with mistral-7b-instruct-v@
Type &amp;#39;exit&amp;#39; or &amp;#39;quit&amp;#39; to exit
Type &amp;#39;!multi&amp;#39; to enter multiple lines, then &amp;#39;!end&amp;#39; to finish
&amp;gt; say hello in spanish

Hola! How can I assist you today?
&amp;gt; now in french
Bonjour! Comment puis-je vous aider aujourd&amp;#39;hui?" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_001544.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Running prompts like this is inefficient, because it loads the full model into memory, runs the prompt and then shuts down the program again.&lt;/p&gt;
&lt;p&gt;Instead, you can use the &lt;code&gt;llm chat&lt;/code&gt; command which keeps the model in memory across multiple prompts:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm chat -m mistral-7b-instruct-v0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Another option is to run &lt;a href="https://ollama.com/"&gt;Ollama&lt;/a&gt;, which runs its own local server hosting models. The &lt;a href="https://github.com/taketwo/llm-ollama"&gt;llm-ollama&lt;/a&gt; plugin can then be used to run prompts through Ollama from LLM.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_002013.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_002013.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_002013.jpg" alt="The llamafile website" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_002013.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Another really exciting option for running models is &lt;a href="https://github.com/Mozilla-Ocho/llamafile"&gt;llamafile&lt;/a&gt;, a project sponsored by Mozilla that uses Justine Tunney's &lt;a href="https://justine.lol/cosmo3/"&gt;Cosmopolitan&lt;/a&gt; to compile a binary that bundles both a model and the software needed to serve it, in a single file that can execute on several different operating systems.&lt;/p&gt;
&lt;p&gt;I've written more about that here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2023/Nov/29/llamafile/"&gt;llamafile is the new best way to run a LLM on your own computer
&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/cosmopolitan/ecosystem"&gt;Catching up with the Cosmopolitan ecosystem
&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_002521.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_002521.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_002521.jpg" alt="Screenshot of a browser window titled llama.cpp showing a photograph

User: Describe image

Llama: The image features a person sitting in a chair next to an outdoor setting with a rooster nearby. A white bowl filled with eggs is placed on the ground, and it appears that the person may be preparing or collecting them from their backyard chicken coop. There are several other birds scattered around the scene as well, adding more life to this" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_002521.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Running &lt;a href="https://llava-vl.github.io/"&gt;LLaVA&lt;/a&gt; using a Llamafile is particularly fun - it's an openly licensed model that can accept images as input as well. It's pretty amazing the results you can get from that, running as a single binary on your laptop.&lt;/p&gt;
&lt;p&gt;Grab that from &lt;a href="https://huggingface.co/Mozilla/llava-v1.5-7b-llamafile"&gt;Mozilla/llava-v1.5-7b-llamafile&lt;/a&gt; on Hugging Face.&lt;/p&gt;
&lt;p&gt;LLM can talk to Llamafile instances via the &lt;a href="https://github.com/simonw/llm-llamafile"&gt;llm-llamafile&lt;/a&gt; plugin.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_002636.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_002636.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_002636.jpg" alt="#!/bin/bash

# Validate that the first argument is an integer
if [[ ! $1 =~ ^[0-9]+$ ]]; then
  echo &amp;quot;Please provide a valid integer as the first argument.&amp;quot;
  exit 1
fi

id=&amp;quot;$1&amp;quot;

# Parse the optional -m argument
model=&amp;quot;haiku&amp;quot;
if [[ $2 == &amp;quot;-m&amp;quot; &amp;amp;&amp;amp; -n $3 ]]; then
  model=&amp;quot;$3&amp;quot;
fi

# Make API call, parse and summarize the discussion
curl -s &amp;quot;https://hn.algolia.com/api/v1/items/$id&amp;quot; | \
  jq -r &amp;#39;recurse(.children[]) | .author + &amp;quot;: &amp;quot; + .text&amp;#39; | \
  llm -m &amp;quot;$model&amp;quot; -s &amp;#39;Summarize the themes of the opinions expressed here.
  For each theme, output a markdown header.
  Include direct &amp;quot;quotations&amp;quot; (with author attribution) where appropriate.
  You MUST quote directly from users when crediting them, with double quotes.
  Fix HTML entities. Output markdown. Go long.&amp;#39;" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_002636.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Now that we can run prompts from our terminal, we can start assembling software by writing scripts.&lt;/p&gt;
&lt;p&gt;Here's a Bash script I wrote to summarize conversations on Hacker News, using longer context models such as Claude 3 Haiku or Google Gemini 1.5 or GPT-4o.&lt;/p&gt;
&lt;p&gt;I wrote more about this in &lt;a href="https://til.simonwillison.net/llms/claude-hacker-news-themes"&gt;Summarizing Hacker News discussion themes with Claude and LLM&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_002850.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_002850.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_002850.jpg" alt="$ hn-summary.sh 40651054 -m 4o

# Dependency and Linking in Software Distribution
Several users discussed issues around dependency management and linking in software distribut
ion:
### Stability and Compatibility Issues
- **e63f67dd-@65b:** &amp;quot;It’s just a classic dependency issue... a combination of ABI instabilit
y and just plain version incompatibility from one distro to the next... My opinion is the opp
osite: I think the old paradigm of distros managing a giant set of system libraries is a bad
one.&amp;quot; —|
- **umanwizard:** &amp;quot;Some distros do actually break out rust dependencies into separate package
s (e.g. Guix does this).&amp;quot;
### Dynamic vs Static Linking
- **jiripospisil:** &amp;quot;Until there&amp;#39;s a vulnerability in one of the dependencies and now you hav
e to rebuild all of the packages which use it... For Rust, there&amp;#39;s also the fact that most pr
ojects use a lock file.&amp;quot;
- **pglata:** &amp;quot;Static linking is essentially easier for people who don&amp;#39;t want to care. But ig
noring security does not mean it solves it" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_002850.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;The script works by hitting the Hacker News Algolia API to return the full, nested JSON structure of the conversation (e.g. &lt;a href="https://hn.algolia.com/api/v1/items/40651054"&gt;this JSON endpoint&lt;/a&gt; for &lt;a href="https://news.ycombinator.com/item?id=40651054"&gt;this conversation&lt;/a&gt;), then runs that through &lt;code&gt;jq&lt;/code&gt; to turn it into text, then pipes that into a model using LLM.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_003325.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_003325.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_003325.jpg" alt="Tutorial: Writing a plugin to support a new model

This tutorial will walk you through developing a new plugin for LLM that adds support for a new Large Language Model.

We will be developing a plugin that implements a simple Markov chain to generate words based on an input string..." /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_003325.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;If you want to contribute to LLM itself, a great way to do that is to write plugins that support new models. I have &lt;a href="https://llm.datasette.io/en/stable/plugins/tutorial-model-plugin.html"&gt;an extensive tutorial&lt;/a&gt; describing how to do that.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_003440.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_003440.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_003440.jpg" alt="$ llm --system &amp;#39;You are a sentient cheesecake&amp;#39; -m gpt-4o --save cheesecake

$ llm chat -t cheesecake
Chatting with gpt-4o
Type &amp;#39;exit&amp;#39; or &amp;#39;quit&amp;#39; to exit
Type &amp;#39;!multi&amp;#39; to enter multiple lines, then &amp;#39;l!end&amp;#39; to finish
&amp;gt; tell me abotu yourself
Hello! I&amp;#39;m a sentient cheesecake, a delightful fusion of creamy textures and rich flavors. My main ingredients include cream cheese, sugar, eggs, and a touch of vanilla, all resting on a
buttery graham cracker crust. I&amp;#39;m the perfect balance of sweet and tangy, and I take great pride in bringing joy to those who savor a slice of me. Despite my delicate nature, I have a surprisingly strong personality-and apparently, the ability to converse! Is there anything specific you&amp;#39;d like to know about the life of a sentient cheesecake?" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_003440.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Another fun trick is to use templates to save and execute parameterized prompts, &lt;a href="https://llm.datasette.io/en/stable/templates.html"&gt;documented here&lt;/a&gt;. The easiest way to create a template is with the &lt;code&gt;--save&lt;/code&gt; option like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm --system 'You are a sentient cheesecake' -m gpt-4o --save cheesecake
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now you can chat with a cheesecake:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm chat -t cheesecake
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_003532.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_003532.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_003532.jpg" alt="$ cat demos.md | llm -s &amp;#39;summary, short&amp;#39;
1. **Terminal Basics**
- Quick reminder on basic terminal commands and piping with “|&amp;quot; and *&amp;gt;&amp;quot;, and using ‘cat’.
2. **Ttok Installation &amp;amp; Demonstration**
- Using “pipx install ttok™ to install “ttok™ and demonstrating its usage.
3. **Installing LLM**
- Installation of LLM using either “brew’ or “pipx&amp;quot;.
4. **Configuring for OpenAI** =
- How to configure LLM for OpenAI.
5. **Running a Prompt**
- Example of running a prompt with LLM.
6. **Installing LLM Plugin: 1lm-claude-3**
- Installing the “llm-claude-3&amp;quot; plugin and running a prompt with it.
7. **Browsing Logs with Datasette**
- Demonstrating browsing logs using Datasette.
8. **Using LLM Command**" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_003532.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Being able to pipe content into &lt;code&gt;llm&lt;/code&gt; is a really important feature.&lt;/p&gt;
&lt;p&gt;The simplest way to do this is with &lt;code&gt;cat&lt;/code&gt; to send in files. This command summarizes the content of a provided file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cat demos.md | llm -s 'summary, short'
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_003552.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_003552.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_003552.jpg" alt="$ llm -c &amp;#39;no much much much shorter, and in haikus&amp;#39;

Terminal basics,
Commands and piping galore,
Cat reminds us all.

Install ttok first,
Pipx makes it easy-peasy,
Quick demonstration.

LLM install,
Brew or pipx, your choice,
Configured OpenAI." /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_003552.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Once you've done this you can send follow-up prompts with &lt;code&gt;-c&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -c 'no much much much shorter, and in haikus'
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_003619.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_003619.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_003619.jpg" alt="Output of files-to-prompt - different files are concatenated together with --- between them followed by their filename" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_003619.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;My &lt;a href="https://github.com/simonw/files-to-prompt"&gt;files-to-prompt&lt;/a&gt; command can turn a directory of files into a single prompt, suitable for piping into LLM.&lt;/p&gt;
&lt;p&gt;I wrote more about this, including how I developed it, in &lt;a href="https://simonwillison.net/2024/Apr/8/files-to-prompt/"&gt;Building files-to-prompt entirely using Claude 3 Opus&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_003715.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_003715.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_003715.jpg" alt="Screeshot of the shot-scraper website

A command-line utility for taking automated screenshots of websites

Quick start:

pip install shot-scraper
shot-scraper install
shot-scraper https://github.com/simonw/shot-scraper -h 900
Produces this screenshot in a file called github-com-simonw-shot-scraper.png

" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_003715.jpg"&gt;#&lt;/a&gt;
 &lt;p&gt;Another tool I frequently use with LLM is &lt;a href="https://shot-scraper.datasette.io/"&gt;shot-scraper&lt;/a&gt; - my command-line tool for screenshotting and scraping websites.&lt;/p&gt;  
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_003736.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_003736.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_003736.jpg" alt="shot-scraper JavaScript command documentation 
" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_003736.jpg"&gt;#&lt;/a&gt;

&lt;p&gt;&lt;code&gt;shot-scraper&lt;/code&gt; can &lt;a href="https://shot-scraper.datasette.io/en/stable/javascript.html"&gt;run JavaScript directly against a page&lt;/a&gt; and output the result back to the terminal, suitable for piping into LLM.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_003832.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_003832.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_003832.jpg" alt="Screenshot of Google search results, with the inspector panel open to show the structure of the HTML." /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_003832.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Google &lt;em&gt;hate&lt;/em&gt; being scraped. Let's use it to scrape Google.&lt;/p&gt;
&lt;p&gt;Google search results have a structure where each link on the page is an &lt;code&gt;&amp;lt;h3&amp;gt;&lt;/code&gt; element wrapped in a link.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_003844.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_003844.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_003844.jpg" alt="shot-scraper javascript &amp;#39;https://www.google.com/search?q=nytimes+slop&amp;#39; &amp;#39;
Array.from(
  document.querySelectorAll(&amp;quot;h3&amp;quot;),
  el =&amp;gt; ({href: el.parentNode.href, title: el.innerText})
)&amp;#39;" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_003844.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;We can scrape that using the following terminal command:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper javascript 'https://www.google.com/search?q=nytimes+slop' '
Array.from(
  document.querySelectorAll("h3"),
  el =&amp;gt; ({href: el.parentNode.href, title: el.innerText})
)'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will load up the search results page in an invisible browser, then execute JavaScript that extracts the results and returns them as JSON.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_003856.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_003856.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_003856.jpg" alt="R T R - A =4 [oBE IR W&amp;#39; R TR |
£ Searc =
= =g _
06_jun_lim — -2sh — 93x24 Rt - .
i 8
&amp;quot;href&amp;quot;: &amp;quot;http://www.nytimes.com/1896/07/31/archives/to-slop.html&amp;quot;, i
“title&amp;quot;: &amp;quot;To Slop.&amp;quot;
i
{
&amp;quot;href&amp;quot;: &amp;quot;https://www.nytimes.com/section/technology”,
&amp;quot;title&amp;quot;: &amp;quot;Technology&amp;quot;
i
1
&amp;quot;href&amp;quot;: &amp;quot;https://www.linkedin.com/posts/luigiraymontanez_slop-is-the-new-name-for-unw
anted-ai-generated-activity-7195821255621574658-MQWK&amp;quot;,
&amp;quot;title&amp;quot;: &amp;quot;Slop is the new name for unwanted AI-generated content”
i
{ e
&amp;quot;href&amp;quot;: &amp;quot;https://www.google.com/search?g=nytimes+slop&amp;amp;sca_esv=a22b124cb26f8e73&amp;amp;ei=F_B
0ZpLQHfet@PEPttSubA4&amp;amp;start=10&amp;amp;sa=N&amp;quot;,
&amp;quot;title&amp;quot;: &amp;quot;More results&amp;quot;
i
o
BTG
&amp;quot;title&amp;quot;: &amp;quot;Try again&amp;quot;
}
1
3
= 7,‘. 54 function findParentWithHveid(element) {
L85 4 __while (element 8§ lelement hasAttribute =
®0A0®17 WO  sSpaces:4 UTF-8 LF Markdown &amp;amp; A17Spell [
" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_003856.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;The results as JSON include the &lt;code&gt;href&lt;/code&gt; and &lt;code&gt;title&lt;/code&gt; of each of those search results.&lt;/p&gt;
&lt;p&gt;We could send that to LLM, but I'd like to grab the search snippets as well.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_003914.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_003914.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_003914.jpg" alt="Screenshot showing the command with the more complex JavaScript." /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_003914.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Here's the more complex recipe that also fetches the search snippets visible on the page:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper javascript 'https://www.google.com/search?q=nytimes+slop' '
() =&amp;gt; {
    function findParentWithHveid(element) {
        while (element &amp;amp;&amp;amp; !element.hasAttribute("data-hveid")) {
            element = element.parentElement;
        }
        return element;
    }
    return Array.from(
        document.querySelectorAll("h3"),
        el =&amp;gt; findParentWithHveid(el).innerText
    );
}' | llm -s 'describe slop'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At the end it pipes them into LLM with instructions to use that context to "describe slop".&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_003931.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_003931.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_003931.jpg" alt="The term &amp;quot;slop&amp;quot; has recently emerged in the context of artificial intelligence to describe questionable or low-quality AI-generated material. This term appears to be gaining traction among tech and media commentators, reflecting growing concerns about the proliferation of such content." /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_003931.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;... and it works! We get back an answer from the LLM that summarizes the search results that we just scraped.&lt;/p&gt;
&lt;p&gt;We have implemented basic RAG - Retrieval Augmented Generation, where search results are used to answer a question - using a terminal script that scrapes search results from Google and pipes them into an LLM.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_004133.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_004133.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_004133.jpg" alt="Output of llm --help showing a list of commands that includes cmd, embed, embed-models and embed-multi." /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_004133.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;Speaking of RAG... a common technique for implementing that pattern is to take advantage of embeddings and vector search to find content that is semantically similar to the user's question, without necessarily matching on exact keywords.&lt;/p&gt;
&lt;p&gt;I wrote an extensive introduction to embeddings in &lt;a href="https://simonwillison.net/2023/Oct/23/embeddings/"&gt;Embeddings: What they are and why they matter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;LLM includes support for calculating, storing and searching embeddings through its &lt;code&gt;llm embed-models&lt;/code&gt;, &lt;code&gt;llm embed&lt;/code&gt; and &lt;code&gt;llm embed-multi&lt;/code&gt; commands, &lt;a href="https://llm.datasette.io/en/stable/embeddings/index.html"&gt;documented here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;llm embed-models&lt;/code&gt; command lists currently available embedding models - the OpenAI models plus any that have been added by plugins.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_004239.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_004239.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_004239.jpg" alt="A huge JSON array of floating point numbers, followed by:

$ llm embed -m 3-small -c &amp;#39;hello there&amp;#39; | jq length

1536" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_004239.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;The &lt;code&gt;llm embed&lt;/code&gt; command can embed content. This will output a list of floating point numbers for the specified content, using the OpenAI 3-small embedding model.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm embed -m 3-small -c "hello there"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Add &lt;code&gt;-f hex&lt;/code&gt; to get that out as hexadecimal. Neither of these formats are particularly useful on their own!&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_004332.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_004332.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_004332.jpg" alt="Screenshot of the llm embed-multi documentation." /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_004332.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;The &lt;a href="https://llm.datasette.io/en/stable/embeddings/cli.html#llm-embed-multi"&gt;llm embed-multi&lt;/a&gt; command is much more useful. It can run embeddings against content in bulk - from a CSV or JSON file, from a directory full of content or even from a SQLite database. Those embedding vectors will be stored in SQLite ready to be used for search or similarity queries.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_004418.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_004418.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_004418.jpg" alt="Screenshot of my blog&amp;#39;s collection of 7178 blogmarks" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_004418.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;I collect bookmarks (I call them "blogmarks") on my blog - links with short descriptions. I have &lt;a href="https://simonwillison.net/search/?type=blogmark"&gt;over 7,178 of them&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let's create embeddings for all of those using LLM.&lt;/p&gt;
&lt;p&gt;I used the SQLite database version of my blog available from &lt;code&gt;https://datasette.simonwillison.net/simonwillisonblog.db&lt;/code&gt; (a 90MB file).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_004424.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_004424.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_004424.jpg" alt="Screenshot of the blog_blogmark table in Datasette" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_004424.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;The content lives in the &lt;a href="https://datasette.simonwillison.net/simonwillisonblog/blog_blogmark"&gt;blog_blogmark&lt;/a&gt; SQLite table, which I can explore using Datasette (in this case the &lt;a href="https://datasette.io/desktop"&gt;Datasette Desktop&lt;/a&gt; macOS Electron app).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_004444.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_004444.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_004444.jpg" alt="Custom SQL query returning more than 2,000 rows 

select id, link_url, link_title, commentary from blog_blogmark" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_004444.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;First step is to compose a SQL query returning the data we want to embed. The first column will be treated as a unique identifier to store along with the embedding vector, and any subsequent columns will be used as input to the embedding model.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;select id, link_url, link_title, commentary from blog_blogmark
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_004502.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_004502.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_004502.jpg" alt="R I R S P Y @ B e A C m 7 Qg
06_jun_lim — -zsh — 93x24 R ERCTT

$ llm embed-multi links \
-d simonwillisonblog.db \
--sql &amp;#39;select id, link_url, link_title, commentary from blog_blogmark&amp;#39; \
-m 3-small --store" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_004502.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;I can run this command to embed all of the content:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm embed-multi links \
  -d simonwillisonblog.db \
  --sql 'select id, link_url, link_title, commentary from blog_blogmark' \
  -m 3-small --store
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will create an embedding collection called "links". It will run the SQL query we created before, using the OpenAI &lt;code&gt;3-small&lt;/code&gt; model. The &lt;code&gt;--store&lt;/code&gt; link means it will store a copy of the text in the database as well - without that it would just store identifiers and we would need to use those to look up the text later on when running queries.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_004521.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_004521.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_004521.jpg" alt="Progress bar - 4% done, 00:01:18" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_004521.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;The &lt;code&gt;llm embed-multi&lt;/code&gt; command shows a progress bar for how far it has got.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_004539.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_004539.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_004539.jpg" alt="Screenshot of the embeddings table in Datasette, with a bunch of binary data visible." /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_004539.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;While the command is running we can use Datasette to watch as the &lt;code&gt;embeddings&lt;/code&gt; table is filled with data - one row for each of the items we are embedding, each storing a big ugly binary blob of data representing the embedding vector (in &lt;a href="https://llm.datasette.io/en/stable/embeddings/storage.html"&gt;this storage format&lt;/a&gt;).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_004720.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_004720.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_004720.jpg" alt="$ llm similar links -c &amp;#39;things that make me angry&amp;#39; -d simonwillisonblog.db
{&amp;quot;1&amp;quot;: &amp;quot;448&amp;quot;, &amp;quot;score&amp;quot;: 8.31105587659133327, “content”: “http://www.russellbeattie.com/notebook/1006697.html Overreaction Absolutely storming rant from Russell about America&amp;#39;s culture of fear.&amp;quot;, &amp;quot;metodata”: null}
..." /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_004720.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;And now we can run searches! This command finds everything in the "links" collection that is most similar to the search term "things that make me angry":&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm similar links -c 'things that make me angry' -d simonwillisonblog.db
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_004824.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_004824.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_004824.jpg" alt="cat ~/.local/bin/blog-answer.sh

#!/bin/bash

# Check if a query was provided
if [ &amp;quot;$#&amp;quot; -ne 1 ]; then
    echo &amp;quot;Usage: $0 &amp;#39;Your query&amp;#39;&amp;quot;
    exit 1
fi

llm similar blog-paragraphs -c &amp;quot;query: $1&amp;quot; \
  | jq &amp;#39;.content | sub(&amp;quot;passage: &amp;quot;; &amp;quot;&amp;quot;)&amp;#39; -r \
  | llm -m llamafile \
  &amp;quot;$1&amp;quot; -s &amp;#39;You answer questions as a single paragraph&amp;#39;

# | llm -m mlc-chat-Llama-2-7b-chat-hf-q4f16_1 \
# /Users/simon/.local/share/virtualenvs/llm-mlc-SwKbovmI/bin/llm -m mlc-chat-Llama-2-7b-chat-hf-q4f16_1
" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_004824.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;We can implement another version of RAG on top of this as well, by finding similar documents to our search term and then piping those results back into LLM to execute a prompt.&lt;/p&gt;
&lt;p&gt;I wrote more about this in &lt;a href="https://til.simonwillison.net/llms/embed-paragraphs"&gt;Embedding paragraphs from my blog with E5-large-v2&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_005209.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_005209.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_005209.jpg" alt="LLM documentation: OpenAl-compatible models" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_005209.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;During the Q&amp;amp;A I mentioned that LLM can talk to anything that provides an OpenAI-compatible API endpoint using just configuration, no extra code. That's &lt;a href="https://llm.datasette.io/en/stable/other-models.html#openai-compatible-models"&gt;described in the documentation here&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_005333.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_005333.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_005333.jpg" alt="Screenshot of the Python API documentation" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_005333.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;I also showed LLM's &lt;a href="https://llm.datasette.io/en/stable/python-api.html"&gt;Python API documentation&lt;/a&gt;, but warned that this is not yet considered stable as I'm not 100% happy with how this API works yet.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="llm-slide" id="frame_010311.jpg"&gt;
  &lt;a class="slide-link" href="https://static.simonwillison.net/static/2024/llm/frame_010311.jpg"&gt;&lt;img src="https://static.simonwillison.net/static/2024/llm/frame_010311.jpg" alt="GitHub Issue with 77 comments: Figure out how to serve an AWS Lambda function with a Function URL from a custom subdomain" /&gt;&lt;/a&gt;
  &lt;a style="float: right; padding-left: 1em; border: none" href="https://simonwillison.net/2024/Jun/17/cli-language-models/#frame_010311.jpg"&gt;#&lt;/a&gt;
  &lt;p&gt;The final question from the audience was about my personal productivity.&lt;/p&gt;
&lt;p&gt;I wrote a bit about this a few years ago in &lt;a href="https://simonwillison.net/2022/Nov/26/productivity/"&gt;Coping strategies for the serial project hoarder&lt;/a&gt; - how I use extensive documentation and unit tests to allow me to work on way more projects at once by ensuring I don't have to remember the details of any of them.&lt;/p&gt;
&lt;p&gt;My other trick is that I tend to pick projects that fit my unique combination of previous experiences. I built LLM because I already had experience with LLM APIs, Python CLI tools (using &lt;a href="https://click.palletsprojects.com/"&gt;Click&lt;/a&gt;) and plugin systems (using &lt;a href="https://pluggy.readthedocs.io/"&gt;Pluggy&lt;/a&gt;). As a result I happened to be one of the best positioned people in the world to build a plugin-based CLI tool for working with LLMs!&lt;/p&gt;
&lt;/div&gt;

&lt;h4 id="llm-colophon"&gt;Colophon&lt;/h4&gt;
&lt;p&gt;Here's how I turned the YouTube video of this talk into an annotated presentation:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;I downloaded a &lt;code&gt;.mp4&lt;/code&gt; version of the talk from YouTube using &lt;a href="https://github.com/yt-dlp/yt-dlp"&gt;yt-dlp&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;I ran that through &lt;a href="https://goodsnooze.gumroad.com/l/macwhisper"&gt;MacWhisper&lt;/a&gt; to create my own transcript for copying extracts from into my write-up - although this time I didn't end up using any of the transcript text.&lt;/li&gt;
&lt;li&gt;I played the video (at 2x speed) in QuickTime Player and used the &lt;code&gt;capture.sh&lt;/code&gt; script &lt;a href="https://til.simonwillison.net/macos/quicktime-capture-script"&gt;described here&lt;/a&gt; to grab screenshots of the individual interesting frames that I wanted to use for my post.&lt;/li&gt;
&lt;li&gt;I loaded those screenshots into &lt;a href="https://til.simonwillison.net/tools/annotated-presentations"&gt;my annotated presentation tool&lt;/a&gt; (which I &lt;a href="https://simonwillison.net/2023/Aug/6/annotated-presentations/"&gt;described in this post&lt;/a&gt;) and used that to run OCR against them for alt text and to add commentary to accompany each screenshot.&lt;/li&gt;
&lt;li&gt;I assembled the result into this finished blog entry, adding intro text and the YouTube embed as HTML.&lt;/li&gt;
&lt;/ol&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cli"&gt;cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/my-talks"&gt;my-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-talks"&gt;annotated-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llamafile"&gt;llamafile&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ollama"&gt;ollama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/files-to-prompt"&gt;files-to-prompt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/macwhisper"&gt;macwhisper&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="cli"/><category term="projects"/><category term="my-talks"/><category term="ai"/><category term="datasette"/><category term="openai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="llm"/><category term="anthropic"/><category term="annotated-talks"/><category term="llamafile"/><category term="ollama"/><category term="files-to-prompt"/><category term="macwhisper"/></entry><entry><title>Options for accessing Llama 3 from the terminal using LLM</title><link href="https://simonwillison.net/2024/Apr/22/llama-3/#atom-tag" rel="alternate"/><published>2024-04-22T13:38:09+00:00</published><updated>2024-04-22T13:38:09+00:00</updated><id>https://simonwillison.net/2024/Apr/22/llama-3/#atom-tag</id><summary type="html">
    &lt;p&gt;Llama 3 was released &lt;a href="https://llama.meta.com/llama3/"&gt;on Thursday&lt;/a&gt;. Early indications are that it's now the best available openly licensed model - Llama 3 70b Instruct has taken joint 5th place on the &lt;a href="https://chat.lmsys.org/?leaderboard"&gt;LMSYS arena leaderboard&lt;/a&gt;, behind only Claude 3 Opus and some GPT-4s and sharing 5th place with Gemini Pro and Claude 3 Sonnet. But unlike those other models Llama 3 70b is weights available and can even be run on a (high end) laptop!&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; command-line tool and Python library provides access to dozens of models via plugins. Here are several ways you can use it to access Llama 3, both hosted versions and running locally on your own hardware.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/22/llama-3/#llama-3-8b-instruct-locally-with-llm-gpt4all"&gt;Llama-3-8B-Instruct locally with llm-gpt4all&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/22/llama-3/#fast-api-access-via-groq"&gt;Fast API access via Groq&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/22/llama-3/#local-llama-3-70b-instruct-with-llamafile"&gt;Local Llama 3 70b Instruct with llamafile&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Apr/22/llama-3/#paid-access-via-other-api-providers"&gt;Paid access via other API providers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="llama-3-8b-instruct-locally-with-llm-gpt4all"&gt;Llama-3-8B-Instruct locally with llm-gpt4all&lt;/h4&gt;
&lt;p&gt;If you want to run Llama 3 locally, the easiest way to do that with LLM is using the &lt;a href="https://github.com/simonw/llm-gpt4all"&gt;llm-gpt4all&lt;/a&gt; plugin. This plugin builds on the excellent &lt;a href="https://gpt4all.io/index.html"&gt;gpt4all&lt;/a&gt; project by Nomic AI, providing a quantized (q4) Llama 3 8B instruct model that needs 8GB of RAM and requires a 4.34GB download.&lt;/p&gt;
&lt;p&gt;If you don't have LLM installed yet, install it like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;brew install llm&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Or like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;pipx install llm&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Install the plugin:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-gpt4all&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The first time you run a prompt it will download the model:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m Meta-Llama-3-8B-Instruct &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Three great names for a pet emu&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;What an interesting question! Here are three name suggestions for your pet emu:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Emmet&lt;/strong&gt;: A playful take on the word "emu" that suits a friendly and curious bird.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gizmo&lt;/strong&gt;: Emus are known for their quirky behavior, so Gizmo is a fitting name to capture their unique personality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aussie&lt;/strong&gt;: As an Australian native species, naming your emu Aussie would be a nod to its heritage.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Remember to choose a name that suits your pet's temperament and characteristics!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Each time you run a command like this the model will be loaded into memory from scratch. To keep it resident in memory, use the &lt;code&gt;llm chat&lt;/code&gt; command:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm chat -m Meta-Llama-3-8B-Instruct&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here's an example session:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Chatting with Meta-Llama-3-8B-Instruct
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
&amp;gt; javascript function to reverse a string
%
Here is the JavaScript code that reverses a given string:
```
function reverseString(str) {
  return str.split("").reverse().join("");
}
```

You can use this function by passing your desired input string as an argument, like so:

```javascript
let reversedStr = reverseString("Hello World");
console.log(reversedStr); // Output: "dlroW olleH"
```
In the code above:
- `str.split("")` splits the original string into individual characters.
- `.reverse()` reverses the order of those characters.
- `.join("")` combines them back together as a single string.

This function works for any input string, not just "Hello World". Try it out with different strings to see how it handles punctuation and spaces!
&amp;gt; capitalize it too
%
Here's an updated JavaScript function that not only reverses the input string but also capitalizes each character:
```javascript
function reverseAndCapitalize(str) {
  return str.split("").reverse().map(function(char) { 
    if (char.match(/[a-z]/i)) {
      return char.toUpperCase();
    } else {
      return char;
    }
  }).join("");
}
```
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(I'm not sure I'd write &lt;code&gt;reverseAndCapitalize()&lt;/code&gt; quite like that, but still a fun example!)&lt;/p&gt;
&lt;p&gt;Consult &lt;a href="https://llm.datasette.io/en/stable/usage.html"&gt;the LLM documentation&lt;/a&gt; for more details on how to use the command-line tool.&lt;/p&gt;
&lt;h4 id="fast-api-access-via-groq"&gt;Fast API access via Groq&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://groq.com/"&gt;Groq&lt;/a&gt; serve openly licensed LLMs at ludicrous speeds using their own custom LPU (Language Processing Unit) Inference Engine. They currently offer a free preview of their API: you can sign up and &lt;a href="https://console.groq.com/keys"&gt;obtain an API key&lt;/a&gt; to start using it.&lt;/p&gt;
&lt;p&gt;You can run prompts against Groq using their &lt;a href="https://console.groq.com/docs/openai"&gt;OpenAI compatible API endpoint&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Edit the file &lt;code&gt; ~/Library/Application Support/io.datasette.llm/extra-openai-models.yaml&lt;/code&gt; - creating it if it doesn't exist - and add the following lines to it:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;model_id&lt;/span&gt;: &lt;span class="pl-s"&gt;groq-openai-llama3&lt;/span&gt;
  &lt;span class="pl-ent"&gt;model_name&lt;/span&gt;: &lt;span class="pl-s"&gt;llama3-70b-8192&lt;/span&gt;
  &lt;span class="pl-ent"&gt;api_base&lt;/span&gt;: &lt;span class="pl-s"&gt;https://api.groq.com/openai/v1&lt;/span&gt;
  &lt;span class="pl-ent"&gt;api_key_name&lt;/span&gt;: &lt;span class="pl-s"&gt;groq&lt;/span&gt;
- &lt;span class="pl-ent"&gt;model_id&lt;/span&gt;: &lt;span class="pl-s"&gt;groq-openai-llama3-8b&lt;/span&gt;
  &lt;span class="pl-ent"&gt;model_name&lt;/span&gt;: &lt;span class="pl-s"&gt;llama3-8b-8192&lt;/span&gt;
  &lt;span class="pl-ent"&gt;api_base&lt;/span&gt;: &lt;span class="pl-s"&gt;https://api.groq.com/openai/v1&lt;/span&gt;
  &lt;span class="pl-ent"&gt;api_key_name&lt;/span&gt;: &lt;span class="pl-s"&gt;groq&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This tells LLM about those models, and makes them accessible via those configured &lt;code&gt;model_id&lt;/code&gt; values.&lt;/p&gt;
&lt;p&gt;Run this command to confirm that the models were registered correctly:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm models &lt;span class="pl-k"&gt;|&lt;/span&gt; grep groq&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You should see this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;OpenAI Chat: groq-openai-llama3
OpenAI Chat: groq-openai-llama3-8b
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Set your Groq API key like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm keys &lt;span class="pl-c1"&gt;set&lt;/span&gt; groq
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; &amp;lt;Paste your API key here&amp;gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now you should be able to run prompts through the models like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m groq-openai-llama3 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;A righteous sonnet about a brave owl&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/groq-sonnet.gif" alt="Animated demo. The sonnet appears in less than a second: Here is a sonnet about a brave owl:  In moonlit skies, a silhouette is seen, A wingspan wide, a watchful, piercing gaze. The owl, a sentinel of secrets keen, Patrols the night, with valor in her ways.  Her feathers soft, a camouflage gray, She glides unseen, a phantom of the night. Her eyes, like lanterns, shining bright and far, Illuminate the darkness, banishing all fright.  Her talons sharp, a grasping, deadly sway, She swoops upon her prey, with silent might. Yet in her heart, a wisdom, old and gray, A fierce devotion to the darkness of the night.  And thus, the owl, a symbol of courage true, Inspires us all, with brave and noble pursuit.  I hope you enjoy this sonnet!" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Groq is &lt;em&gt;fast&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;There's also a &lt;a href="https://github.com/angerman/llm-groq"&gt;llm-groq&lt;/a&gt; plugin but it hasn't shipped support for the new models just yet - though there's &lt;a href="https://github.com/angerman/llm-groq/pull/5"&gt;a PR for that by Lex Herbert here&lt;/a&gt; and you can install the plugin directly from that PR like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install https://github.com/lexh/llm-groq/archive/ba9d7de74b3057b074a85fe99fe873b75519bd78.zip
llm keys &lt;span class="pl-c1"&gt;set&lt;/span&gt; groq
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; paste API key here&lt;/span&gt;
llm -m groq-llama3-70b &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;say hi in spanish five ways&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h4 id="local-llama-3-70b-instruct-with-llamafile"&gt;Local Llama 3 70b Instruct with llamafile&lt;/h4&gt;
&lt;p&gt;The Llama 3 8b model is easy to run on a laptop, but it's pretty limited in capability. The 70b model is the one that's starting to get competitive with GPT-4. Can we run that on a laptop?&lt;/p&gt;
&lt;p&gt;I managed to run the 70b model on my 64GB MacBook Pro M2 using &lt;a href="https://github.com/Mozilla-Ocho/llamafile"&gt;llamafile&lt;/a&gt; (&lt;a href="https://simonwillison.net/2023/Nov/29/llamafile/"&gt;previously on this blog&lt;/a&gt;) - after quitting most other applications to make sure the 37GB of RAM it needed was available.&lt;/p&gt;
&lt;p&gt;I used the &lt;code&gt;Meta-Llama-3-70B-Instruct.Q4_0.llamafile&lt;/code&gt; Q4 version from &lt;a href="https://huggingface.co/jartine/Meta-Llama-3-70B-Instruct-llamafile/tree/main"&gt;jartine/Meta-Llama-3-70B-Instruct-llamafile&lt;/a&gt; - a 37GB download. I have a dedicated external hard disk (a Samsung T7 Shield) for this kind of thing.&lt;/p&gt;
&lt;p&gt;Here's how I got it working:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;curl -L -o Meta-Llama-3-70B-Instruct.Q4_0.llamafile &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://huggingface.co/jartine/Meta-Llama-3-70B-Instruct-llamafile/resolve/main/Meta-Llama-3-70B-Instruct.Q4_0.llamafile?download=true&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; That downloads 37GB - now make it executable&lt;/span&gt;
chmod 755 Meta-Llama-3-70B-Instruct.Q4_0.llamafile
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; And start it running:&lt;/span&gt;
./Meta-Llama-3-70B-Instruct.Q4_0.llamafile&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A &lt;code&gt;llamafile&lt;/code&gt; is an executable that runs on virtually &lt;em&gt;any platform&lt;/em&gt; - see &lt;a href="https://til.simonwillison.net/cosmopolitan/ecosystem"&gt;my previous notes&lt;/a&gt; on Cosmopolitan and Actually Portable Executable for more on how that works.&lt;/p&gt;
&lt;p&gt;This will take quite a while to start, because it needs to load that full 37GB of binary content into memory. Once it's finished loading a local web server becomes available at &lt;a href="http://127.0.0.1:8080/"&gt;http://127.0.0.1:8080/&lt;/a&gt; - this serves a web UI you can use to interact with the model, and also serves another OpenAI-compatible API endpoint.&lt;/p&gt;
&lt;p&gt;The easiest way to access this from LLM is to install the &lt;a href="https://github.com/simonw/llm-llamafile"&gt;llm-llamafile&lt;/a&gt; plugin:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-llamafile&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;All this plugin does is configure a model called &lt;code&gt;llamafile&lt;/code&gt; that attempts to access the model hosted on port 8080. You can run prompts like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m llamafile &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;3 neat characteristics of a pelican&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;Here are three neat characteristics of a pelican:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Unique Beak&lt;/strong&gt;: Pelicans have a distinctive beak that is shaped like a scoop or a basket. This beak is specially designed to catch fish, and it can hold up to 3 gallons of water! The beak is also very sensitive, which helps pelicans detect the presence of fish in the water.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Waterproof Feathers&lt;/strong&gt;: Pelicans have a special coating on their feathers that makes them waterproof. This is essential for their fishing lifestyle, as they need to be able to dive into the water without getting weighed down by wet feathers. The coating is made up of a waxy substance that helps to repel water.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Pouch-Like Throat&lt;/strong&gt;: Pelicans have a unique throat pouch that allows them to catch and store fish. When they dive into the water, they use their beak to scoop up fish, and then they store them in their throat pouch. The pouch can expand to hold multiple fish, and the pelican can then swallow the fish whole or regurgitate them to feed their young. This pouch is a key adaptation that helps pelicans thrive in their aquatic environment.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you don't want to install another plugin, you can instead configure the model by adding this to your &lt;code&gt;openai-extra-models.yaml&lt;/code&gt; file:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;model_id&lt;/span&gt;: &lt;span class="pl-s"&gt;llamafile&lt;/span&gt;
  &lt;span class="pl-ent"&gt;model_name&lt;/span&gt;: &lt;span class="pl-s"&gt;llamafile&lt;/span&gt;
  &lt;span class="pl-ent"&gt;api_base&lt;/span&gt;: &lt;span class="pl-s"&gt;http://localhost:8080/v1&lt;/span&gt;
  &lt;span class="pl-ent"&gt;api_key&lt;/span&gt;: &lt;span class="pl-s"&gt;x&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;One warning about this approach: if you use LLM like this then every prompt you run through &lt;code&gt;llamafile&lt;/code&gt; will be stored under the same model name in your &lt;a href="https://llm.datasette.io/en/stable/logging.html"&gt;SQLite logs&lt;/a&gt;, even if you try out different &lt;code&gt;llamafile&lt;/code&gt; models at different times. You could work around this by registering them with different &lt;code&gt;model_id&lt;/code&gt; values in the YAML file.&lt;/p&gt;
&lt;h4 id="paid-access-via-other-api-providers"&gt;Paid access via other API providers&lt;/h4&gt;
&lt;p&gt;A neat thing about open weight models is that multiple API providers can offer them, encouraging them to aggressively compete on price.&lt;/p&gt;
&lt;p&gt;Groq is currently free, but that's with a limited number of free requests.&lt;/p&gt;
&lt;p&gt;A number of other providers are now hosting Llama 3, and many of them have plugins available for LLM. Here are a few examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.perplexity.ai/"&gt;Perplexity Labs&lt;/a&gt; are offering &lt;code&gt;llama-3-8b-instruct&lt;/code&gt; and &lt;code&gt;llama-3-70b-instruct&lt;/code&gt;. The &lt;a href="https://github.com/hex/llm-perplexity"&gt;llm-perplexity&lt;/a&gt; plugin provides access - &lt;code&gt;llm install llm-perplexity&lt;/code&gt; to install, &lt;code&gt;llm keys set perplexity&lt;/code&gt; to set an &lt;a href="https://www.perplexity.ai/settings/api"&gt;API key&lt;/a&gt; and then run prompts against those two model IDs. Current &lt;a href="https://docs.perplexity.ai/docs/pricing"&gt;price&lt;/a&gt; for 8b is $0.20 per million tokens, for 80b is $1.00.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.anyscale.com/endpoints"&gt;Anyscale Endpoints&lt;/a&gt; have &lt;code&gt;meta-llama/Llama-3-8b-chat-hf&lt;/code&gt; ($0.15/million tokens) and &lt;code&gt;meta-llama/Llama-3-70b-chat-hf&lt;/code&gt; ($1.0/million tokens) (&lt;a href="https://docs.endpoints.anyscale.com/pricing/"&gt;pricing&lt;/a&gt;). &lt;code&gt;llm install llm-anyscale-endpoints&lt;/code&gt;, then &lt;code&gt;llm keys set anyscale-endpoints&lt;/code&gt; to set the &lt;a href="https://app.endpoints.anyscale.com/"&gt;API key&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://fireworks.ai/"&gt;Fireworks AI&lt;/a&gt; have &lt;code&gt;fireworks/models/llama-v3-8b-instruct&lt;/code&gt; for $0.20/million and &lt;code&gt;fireworks/models/llama-v3-70b-instruct&lt;/code&gt; for $0.90/million (&lt;a href="https://fireworks.ai/pricing"&gt;pricing&lt;/a&gt;). &lt;code&gt;llm install llm-fireworks&lt;/code&gt;, then &lt;code&gt;llm keys set fireworks&lt;/code&gt; to set the &lt;a href="https://fireworks.ai/api-keys"&gt;API key&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://openrouter.ai/"&gt;OpenRouter&lt;/a&gt; provide proxied accessed to Llama 3 from a number of different providers at different prices, documented on their &lt;a href="https://openrouter.ai/models/meta-llama/llama-3-70b-instruct"&gt;meta-llama/llama-3-70b-instruct&lt;/a&gt; and &lt;a href="https://openrouter.ai/models/meta-llama/llama-3-8b-instruct"&gt;meta-llama/llama-3-8b-instruct&lt;/a&gt; pages (&lt;a href="https://openrouter.ai/models?q=llama%203"&gt;and more&lt;/a&gt;). Use the &lt;a href="https://github.com/simonw/llm-openrouter"&gt;llm-openrouter&lt;/a&gt; plugin for those.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.together.ai/"&gt;Together AI&lt;/a&gt; has both models as well. The &lt;a href="https://github.com/wearedevx/llm-together"&gt;llm-together&lt;/a&gt; plugin provides access to &lt;code&gt;meta-llama/Llama-3-8b-chat-hf&lt;/code&gt; and &lt;code&gt;meta-llama/Llama-3-70b-chat-hf&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I'm sure there are more - these are just the ones I've tried out myself. Check the &lt;a href="https://llm.datasette.io/en/stable/plugins/directory.html"&gt;LLM plugin directory&lt;/a&gt; for other providers, or if a provider emulates the OpenAI API you can configure with the YAML file as shown above or &lt;a href="https://llm.datasette.io/en/stable/other-models.html#openai-compatible-models"&gt;described in the LLM documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="that-s-a-lot-of-options"&gt;That's a lot of options&lt;/h4&gt;
&lt;p&gt;One key idea behind LLM is to use plugins to provide access to as many different models as possible. Above I've listed two ways to run Llama 3 locally and six different API vendors that LLM can access as well.&lt;/p&gt;
&lt;p&gt;If you're inspired to write your own plugin it's pretty simple: each of the above plugins is open source, and there's a detailed tutorial on &lt;a href="https://llm.datasette.io/en/stable/plugins/tutorial-model-plugin.html"&gt;Writing a plugin to support a new model&lt;/a&gt; on the LLM website.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llamafile"&gt;llamafile&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/groq"&gt;groq&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openrouter"&gt;openrouter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatbot-arena"&gt;chatbot-arena&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="projects"/><category term="ai"/><category term="generative-ai"/><category term="llama"/><category term="local-llms"/><category term="llms"/><category term="llm"/><category term="llamafile"/><category term="groq"/><category term="llm-release"/><category term="openrouter"/><category term="chatbot-arena"/></entry><entry><title>Many options for running Mistral models in your terminal using LLM</title><link href="https://simonwillison.net/2023/Dec/18/mistral/#atom-tag" rel="alternate"/><published>2023-12-18T18:18:44+00:00</published><updated>2023-12-18T18:18:44+00:00</updated><id>https://simonwillison.net/2023/Dec/18/mistral/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;a href="https://mistral.ai/"&gt;Mistral AI&lt;/a&gt; is the most exciting AI research lab at the moment. They've now released two extremely powerful smaller Large Language Models under an Apache 2 license, and have a third much larger one that's available via their API.&lt;/p&gt;
&lt;p&gt;I've been trying out their models using my &lt;a href="https://llm.datasette.io/"&gt;LLM command-line tool tool&lt;/a&gt;. Here's what I've figured out so far.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2023/Dec/18/mistral/#mixtral-llama-cpp"&gt;Mixtral 8x7B via llama.cpp and llm-llama-cpp&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2023/Dec/18/mistral/#mistral-7b-local"&gt;Mistral 7B via llm-llama-cpp or llm-gpt4all or llm-mlc&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2023/Dec/18/mistral/#mistral-api"&gt;Using the Mistral API, which includes the new Mistral-medium&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2023/Dec/18/mistral/#mistral-other-apis"&gt;Mistral via other API providers&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2023/Dec/18/mistral/#llamafile-openai"&gt;Using Llamafile's OpenAI API endpoint&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="mixtral-llama-cpp"&gt;Mixtral 8x7B via llama.cpp and llm-llama-cpp&lt;/h4&gt;
&lt;p&gt;On Friday 8th December Mistral AI &lt;a href="https://twitter.com/MistralAI/status/1733150512395038967"&gt;tweeted a mysterious magnet&lt;/a&gt; (BitTorrent) link. This is the second time they've done this, the first was on September 26th when &lt;a href="https://twitter.com/MistralAI/status/1706877320844509405"&gt;they released&lt;/a&gt; their excellent Mistral 7B model, also as a magnet link.&lt;/p&gt;
&lt;p&gt;The new release was an 87GB file containing Mixtral 8x7B - "a high-quality sparse mixture of experts model (SMoE) with open weights", according to &lt;a href="https://mistral.ai/news/mixtral-of-experts/"&gt;the article&lt;/a&gt; they released three days later.&lt;/p&gt;
&lt;p&gt;Mixtral is a &lt;em&gt;very&lt;/em&gt; impressive model. GPT-4 has long been rumored to use a mixture of experts architecture, and Mixtral is the first truly convincing openly licensed implementation of this architecture I've seen. It's already showing impressive benchmark scores.&lt;/p&gt;
&lt;p&gt;This &lt;a href="https://github.com/ggerganov/llama.cpp/pull/4406"&gt;PR for llama.cpp&lt;/a&gt; added support for the new model. &lt;a href="https://github.com/abetlen/llama-cpp-python"&gt;llama-cpp-python&lt;/a&gt; updated to land that patch shortly afterwards.&lt;/p&gt;
&lt;p&gt;Which means... you can now run Mixtral on a Mac (and other platforms too, though I haven't tested them myself yet) using my &lt;a href="https://github.com/simonw/llm-llama-cpp"&gt;llm-llama-cpp plugin&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's how to do that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://llm.datasette.io/en/stable/setup.html"&gt;Install LLM&lt;/a&gt;:
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;pipx install llm&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;Install the plugin:
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-llama-cpp&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;Install &lt;code&gt;llama-cpp-python&lt;/code&gt; - this needs to be done manually because the best approach differs for different platforms. On an Apple Silicon Mac I recommend running:
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;CMAKE_ARGS=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;-DLLAMA_METAL=on&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; FORCE_CMAKE=1 llm install llama-cpp-python&lt;/pre&gt;&lt;/div&gt;
More details &lt;a href="https://github.com/simonw/llm-llama-cpp"&gt;in the llm-llama-cpp README&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Download a GGUF file for Mixtral 8X7B Instruct v0.1. TheBloke has &lt;a href="https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/tree/main"&gt;8 different options&lt;/a&gt; for this - I picked the 36GB &lt;code&gt;mixtral-8x7b-instruct-v0.1.Q6_K.gguf&lt;/code&gt;:
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;curl -LO &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/resolve/main/mixtral-8x7b-instruct-v0.1.Q6_K.gguf?download=true&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;Run the model like this:
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m gguf \
  -o path mixtral-8x7b-instruct-v0.1.Q6_K.gguf \
  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;[INST] Write a Python function that downloads a file from a URL[/INST]&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The &lt;code&gt;-m gguf&lt;/code&gt; here says to use the &lt;code&gt;gguf&lt;/code&gt; model that was installed by the &lt;code&gt;llm-llama-cpp&lt;/code&gt; plugin - this is a model which requires you to pass a &lt;code&gt;path&lt;/code&gt; option - with &lt;code&gt;-o path filename&lt;/code&gt; - specifying which model you would like to run.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;[INST]&lt;/code&gt; part there is the format of prompt expected by the Instruct model. I need to update the &lt;code&gt;llm-llama-cpp&lt;/code&gt; plugin to make it easier to automatically construct the correct prompt format, see &lt;a href="https://github.com/simonw/llm-llama-cpp/issues/28"&gt;this issue&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's what I got for that prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Sure, here's an example Python function that downloads a file from a URL using the &lt;code&gt;requests&lt;/code&gt; library:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;requests&lt;/span&gt;

&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;download_file&lt;/span&gt;(&lt;span class="pl-s1"&gt;url&lt;/span&gt;, &lt;span class="pl-s1"&gt;save_path&lt;/span&gt;):
    &lt;span class="pl-s"&gt;"""&lt;/span&gt;
&lt;span class="pl-s"&gt;    Download a file from a URL and save it to a local path.&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;    :param url: The URL of the file to download.&lt;/span&gt;
&lt;span class="pl-s"&gt;    :param save_path: The local path to save the file.&lt;/span&gt;
&lt;span class="pl-s"&gt;    """&lt;/span&gt;
    &lt;span class="pl-s1"&gt;response&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;requests&lt;/span&gt;.&lt;span class="pl-en"&gt;get&lt;/span&gt;(&lt;span class="pl-s1"&gt;url&lt;/span&gt;, &lt;span class="pl-s1"&gt;stream&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;True&lt;/span&gt;)
    &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-s1"&gt;response&lt;/span&gt;.&lt;span class="pl-s1"&gt;status_code&lt;/span&gt; &lt;span class="pl-c1"&gt;==&lt;/span&gt; &lt;span class="pl-c1"&gt;200&lt;/span&gt;:
        &lt;span class="pl-k"&gt;with&lt;/span&gt; &lt;span class="pl-en"&gt;open&lt;/span&gt;(&lt;span class="pl-s1"&gt;save_path&lt;/span&gt;, &lt;span class="pl-s"&gt;'wb'&lt;/span&gt;) &lt;span class="pl-k"&gt;as&lt;/span&gt; &lt;span class="pl-s1"&gt;f&lt;/span&gt;:
            &lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-s1"&gt;chunk&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;response&lt;/span&gt;.&lt;span class="pl-en"&gt;iter_content&lt;/span&gt;(&lt;span class="pl-c1"&gt;1024&lt;/span&gt;):
                &lt;span class="pl-s1"&gt;f&lt;/span&gt;.&lt;span class="pl-en"&gt;write&lt;/span&gt;(&lt;span class="pl-s1"&gt;chunk&lt;/span&gt;)
        &lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s"&gt;f"File downloaded successfully at &lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;&lt;span class="pl-s1"&gt;save_path&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;"&lt;/span&gt;)
    &lt;span class="pl-k"&gt;else&lt;/span&gt;:
        &lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s"&gt;f"Failed to download file from &lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;&lt;span class="pl-s1"&gt;url&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;. Status code: &lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;&lt;span class="pl-s1"&gt;response&lt;/span&gt;.&lt;span class="pl-s1"&gt;status_code&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;"&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;To use this function, simply call it with the URL of the file you want to download and the local path where you want to save it. For example:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-en"&gt;download_file&lt;/span&gt;(&lt;span class="pl-s"&gt;'https://example.com/myfile.txt'&lt;/span&gt;, &lt;span class="pl-s"&gt;'/path/to/save/myfile.txt'&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;Note that this function uses the &lt;code&gt;stream=True&lt;/code&gt; parameter in the &lt;code&gt;requests.get()&lt;/code&gt; method to download the file in chunks, which can help prevent memory issues when downloading large files. It also checks the HTTP status code of the response before saving the file, so you can handle any errors that might occur during the download.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That's a very solid reply!&lt;/p&gt;
&lt;h4 id="mistral-7b-local"&gt;Mistral 7B via llm-llama-cpp or llm-gpt4all or llm-mlc&lt;/h4&gt;
&lt;p&gt;The smaller Mistral 7B model dropped back in September. It's since established itself as the most capable model family of that size - a size which is very convenient for running on personal devices.&lt;/p&gt;
&lt;p&gt;I'm even running Mistral 7B on my iPhone now, thanks to an update to the &lt;a href="https://apps.apple.com/us/app/mlc-chat/id6448482937"&gt;MLC Chat iOS app&lt;/a&gt; from a few days ago.&lt;/p&gt;
&lt;p&gt;There are a bunch of different options for running this model and its variants locally using LLM on a Mac - and probably other platforms too, though I've not tested these options myself on Linux or Windows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Using &lt;a href="https://github.com/simonw/llm-llama-cpp"&gt;llm-llama-cpp&lt;/a&gt;: download one of &lt;a href="https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF"&gt;these Mistral-7B-Instruct GGUF files&lt;/a&gt; for the chat-tuned version, or &lt;a href="https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/tree/main"&gt;one of these&lt;/a&gt; for base Mistral, then follow the steps listed above&lt;/li&gt;
&lt;li&gt;Using &lt;a href="https://github.com/simonw/llm-gpt4all"&gt;llm-gpt4all&lt;/a&gt;. This is the easiest plugin to install:
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-gpt4all&lt;/pre&gt;&lt;/div&gt;
The model will be downloaded the first time you try to use it:
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m mistral-7b-instruct-v0 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Introduce yourself&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;Using &lt;a href="https://github.com/simonw/llm-mlc"&gt;llm-mlc&lt;/a&gt;. Follow the instructions in the README to install it, then:
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Download the model:&lt;/span&gt;
llm mlc download-model https://huggingface.co/mlc-ai/mlc-chat-Mistral-7B-Instruct-v0.2-q3f16_1
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Run it like this:&lt;/span&gt;
llm -m mlc-chat-Mistral-7B-Instruct-v0.2-q3f16_1 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Introduce yourself&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each of these options work, but I've not spent time yet comparing them in terms of output quality or performance.&lt;/p&gt;
&lt;h4 id="mistral-api"&gt;Using the Mistral API, which includes the new Mistral-medium&lt;/h4&gt;
&lt;p&gt;Mistral also recently announced &lt;a href="https://mistral.ai/news/la-plateforme/"&gt;La plateforme&lt;/a&gt;, their early access API for calling hosted versions of their models.&lt;/p&gt;
&lt;p&gt;Their new API renames Mistral 7B model "Mistral-tiny", the new Mixtral model "Mistral-small"... and offers something called &lt;strong&gt;Mistral-medium&lt;/strong&gt; as well:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Our highest-quality endpoint currently serves a prototype model, that is currently among the top serviced models available based on standard benchmarks. It masters English/French/Italian/German/Spanish and code and obtains a score of 8.6 on MT-Bench.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I got access to their API and used it to build a new plugin, &lt;a href="https://github.com/simonw/llm-mistral"&gt;llm-mistral&lt;/a&gt;. Here's how to use that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Install it:
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-mistral&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;Set your Mistral API key:
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm keys &lt;span class="pl-c1"&gt;set&lt;/span&gt; mistral
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; &amp;lt;paste key here&amp;gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;Run the models like this:
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m mistral-tiny &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Say hi&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Or mistral-small or mistral-medium&lt;/span&gt;
cat mycode.py &lt;span class="pl-k"&gt;|&lt;/span&gt; llm -m mistral-medium -s &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Explain this code&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Here's their comparison table pitching Mistral Small and Medium against GPT-3.5:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2023/mistral-table.jpg" alt="MMLU (MCQ in 57 subjects): GPT - 3.5 scored 70%, Mistral Small scored 70.6%, Mistral Medium scored 75.3%. HellaSwag (10-shot): GPT - 3.5 scored 85.5%, Mistral Small scored 86.7%, Mistral Medium scored 88%. ARC Challenge (25-shot): GPT - 3.5 scored 85.2%, Mistral Small scored 85.8%, Mistral Medium scored 89.9%. WinoGrande (5-shot): GPT - 3.5 scored 81.6%, Mistral Small scored 81.2%, Mistral Medium scored 88%. MBPP (pass@1): GPT - 3.5 scored 52.2%, Mistral Small scored 60.7%, Mistral Medium scored 62.3%. GSM-8K (5-shot): GPT - 3.5 scored 57.1%, Mistral Small scored 58.4%, Mistral Medium scored 66.7%. MT Bench (for Instruct models): GPT - 3.5 scored 8.32, Mistral Small scored 8.30, Mistral Medium scored 8.61." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;These may well be cherry-picked, but note that Small beats GPT-3.5 on almost every metric, and Medium beats it on everything by a wider margin.&lt;/p&gt;
&lt;p&gt;Here's the &lt;a href="https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard"&gt;MT Bench leaderboard&lt;/a&gt; which includes scores for GPT-4 and Claude 2.1:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2023/mt-bench.jpg" alt="GPT-4-Turbo: Arena Elo rating 1217, MT-bench score 9.32. GPT-4-0613: Arena Elo rating 1152, MT-bench score 9.18. GPT-4-0314: Arena Elo rating 1201, MT-bench score 8.96. GPT-3.5-turbo-0613: Arena Elo rating 1112, MT-bench score 8.39. GPT-3.5-Turbo-1106: Arena Elo rating 1074, MT-bench score 8.32. Claude-2.1: Arena Elo rating 1118, MT-bench score 8.18." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;That 8.61 score for Medium puts it half way between GPT-3.5 and GPT-4.&lt;/p&gt;
&lt;p&gt;Benchmark scores are no replacement for spending time with a model to get a feel for how well it behaves across a wide spectrum of tasks, but these scores are extremely promising. GPT-4 may not hold the best model crown for much longer.&lt;/p&gt;
&lt;h4 id="mistral-other-apis"&gt;Mistral via other API providers&lt;/h4&gt;
&lt;p&gt;Since both Mistral 7B and Mixtral 8x7B are available under an Apache 2 license, there's been something of a race to the bottom in terms of pricing from other LLM hosting providers.&lt;/p&gt;
&lt;p&gt;This trend makes me a little nervous, since it actively disincentivizes future open model releases from Mistral and from other providers who are hoping to offer their own hosted versions.&lt;/p&gt;
&lt;p&gt;LLM has plugins for a bunch of these providers already. The three that I've tried so far are Replicate, Anyscale Endpoints and OpenRouter.&lt;/p&gt;
&lt;p&gt;For &lt;a href="https://replicate.com/"&gt;Replicate&lt;/a&gt; using &lt;a href="https://github.com/simonw/llm-replicate"&gt;llm-replicate&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-replicate
llm keys &lt;span class="pl-c1"&gt;set&lt;/span&gt; replicate
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; &amp;lt;paste API key here&amp;gt;&lt;/span&gt;
llm replicate add mistralai/mistral-7b-v0.1&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then run prompts like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m replicate-mistralai-mistral-7b-v0.1 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;3 reasons to get a pet weasel:&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This example is the non-instruct tuned model, so the prompt needs to be shaped such that the model can complete it.&lt;/p&gt;
&lt;p&gt;For &lt;a href="https://www.anyscale.com/endpoints"&gt;Anyscale Endpoints&lt;/a&gt; using &lt;a href="https://github.com/simonw/llm-anyscale-endpoints"&gt;llm-anyscale-endpoints&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-anyscale-endpoints
llm keys &lt;span class="pl-c1"&gt;set&lt;/span&gt; anyscale-endpoints
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; &amp;lt;paste API key here&amp;gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now you can run both the 7B and the Mixtral 8x7B models:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m mistralai/Mixtral-8x7B-Instruct-v0.1 \
  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;3 reasons to get a pet weasel&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
llm -m mistralai/Mistral-7B-Instruct-v0.1 \
  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;3 reasons to get a pet weasel&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And for &lt;a href="https://openrouter.ai/"&gt;OpenRouter&lt;/a&gt; using &lt;a href="https://github.com/simonw/llm-openrouter"&gt;llm-openrouter&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-openrouter
llm keys &lt;span class="pl-c1"&gt;set&lt;/span&gt; openrouter
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; &amp;lt;paste API key here&amp;gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then run the models like so:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m openrouter/mistralai/mistral-7b-instruct \
  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;2 reasons to get a pet dragon&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
llm -m openrouter/mistralai/mixtral-8x7b-instruct \
  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;2 reasons to get a pet dragon&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;OpenRouter are currently offering Mistral and Mixtral via their API for $0.00/1M input tokens - it's free! Obviously not sustainable, so don't rely on that continuing, but that does make them a great platform for running some initial experiments with these models.&lt;/p&gt;
&lt;h4 id="llamafile-openai"&gt;Using Llamafile's OpenAI API endpoint&lt;/h4&gt;
&lt;p&gt;I &lt;a href="https://simonwillison.net/2023/Nov/29/llamafile/"&gt;wrote about Llamafile&lt;/a&gt; recently, a fascinating option fur running LLMs where the LLM can be bundled up in an executable that includes everything needed to run it, on multiple platforms.&lt;/p&gt;
&lt;p&gt;Justine Tunney released &lt;a href="https://huggingface.co/jartine/Mixtral-8x7B-v0.1.llamafile/tree/main"&gt;llamafiles for Mixtral&lt;/a&gt; a few days ago.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://huggingface.co/jartine/Mixtral-8x7B-v0.1.llamafile/blob/main/mixtral-8x7b-instruct-v0.1.Q5_K_M-server.llamafile"&gt;mixtral-8x7b-instruct-v0.1.Q5_K_M-server.llamafile&lt;/a&gt; one runs an OpenAI-compatible API endpoints which LLM can talk to.&lt;/p&gt;
&lt;p&gt;Here's how to use that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Download the llamafile:
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;curl -LO https://huggingface.co/jartine/Mixtral-8x7B-v0.1.llamafile/resolve/main/mixtral-8x7b-instruct-v0.1.Q5_K_M-server.llamafile&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;Start that running:
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;./mixtral-8x7b-instruct-v0.1.Q5_K_M-server.llamafile&lt;/pre&gt;&lt;/div&gt;
You may need to &lt;code&gt;chmod 755 mixtral-8x7b-instruct-v0.1.Q5_K_M-server.llamafile&lt;/code&gt; it first, but I found I didn't need to.&lt;/li&gt;
&lt;li&gt;Configure LLM to know about that endpoint, by adding the following to a file at &lt;code&gt;~/Library/Application Support/io.datasette.llm/extra-openai-models.yaml&lt;/code&gt;:
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;model_id&lt;/span&gt;: &lt;span class="pl-s"&gt;llamafile&lt;/span&gt;
  &lt;span class="pl-ent"&gt;model_name&lt;/span&gt;: &lt;span class="pl-s"&gt;llamafile&lt;/span&gt;
  &lt;span class="pl-ent"&gt;api_base&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;http://127.0.0.1:8080/v1&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
This registers a model called &lt;code&gt;llamafile&lt;/code&gt; which you can now call like this:
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m llamafile &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Say hello to the world&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Setting up that &lt;code&gt;llamafile&lt;/code&gt; alias means you'll be able to use the same CLI invocation for any llamafile models you run on that default 8080 port.&lt;/p&gt;
&lt;p&gt;The same exact approach should work for other model hosting options that provide an endpoint that imitates the OpenAI API.&lt;/p&gt;
&lt;h4&gt;This is LLM plugins working as intended&lt;/h4&gt;
&lt;p&gt;When I &lt;a href="https://simonwillison.net/2023/Jul/12/llm/"&gt;added plugin support to LLM&lt;/a&gt; this was exactly what I had in mind: I want it to be as easy as possible to add support for new models, both local and remotely hosted.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://llm.datasette.io/en/stable/plugins/directory.html"&gt;LLM plugin directory&lt;/a&gt; lists 19 plugins in total now.&lt;/p&gt;
&lt;p&gt;If you want to build your own plugin - for a locally hosted model or for one exposed via a remote API - the &lt;a href="https://llm.datasette.io/en/stable/plugins/tutorial-model-plugin.html"&gt;plugin author tutorial&lt;/a&gt; (plus reviewing code from the existing plugins) should hopefully provide everything you need.&lt;/p&gt;
&lt;p&gt;You're also welcome to join us in the &lt;a href="https://datasette.io/discord-llm"&gt;#llm Discord channel&lt;/a&gt; to talk about your plans for your project.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cli"&gt;cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mistral"&gt;mistral&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llamafile"&gt;llamafile&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama-cpp"&gt;llama-cpp&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openrouter"&gt;openrouter&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="cli"/><category term="plugins"/><category term="projects"/><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="llm"/><category term="mistral"/><category term="llamafile"/><category term="llama-cpp"/><category term="openrouter"/></entry><entry><title>llamafile is the new best way to run an LLM on your own computer</title><link href="https://simonwillison.net/2023/Nov/29/llamafile/#atom-tag" rel="alternate"/><published>2023-11-29T20:54:16+00:00</published><updated>2023-11-29T20:54:16+00:00</updated><id>https://simonwillison.net/2023/Nov/29/llamafile/#atom-tag</id><summary type="html">
    &lt;p&gt;Mozilla’s innovation group and Justine Tunney &lt;a href="https://hacks.mozilla.org/2023/11/introducing-llamafile/"&gt;just released llamafile&lt;/a&gt;, and I think it's now the single best way to get started running Large Language Models (think your own local copy of ChatGPT) on your own computer.&lt;/p&gt;
&lt;p&gt;A llamafile is a single multi-GB file that contains both the model weights for an LLM and the code needed to run that model - in some cases a full local server with a web UI for interacting with it.&lt;/p&gt;
&lt;p&gt;The executable is compiled using &lt;a href="https://justine.lol/cosmopolitan/index.html"&gt;Cosmopolitan Libc&lt;/a&gt;, Justine's incredible project that supports compiling a single binary that works, unmodified, on multiple different operating systems and hardware architectures.&lt;/p&gt;
&lt;p&gt;Here's how to get started with &lt;a href="https://llava-vl.github.io/"&gt;LLaVA 1.5&lt;/a&gt;, a large multimodal model (which means text and image inputs, like GPT-4 Vision) fine-tuned on top of Llama 2. I've tested this process on an M2 Mac, but it should work on other platforms as well (though be sure to &lt;a href="https://github.com/mozilla-Ocho/llamafile#gotchas"&gt;read the Gotchas&lt;/a&gt; section of the README, and take a look at Justine's &lt;a href="https://news.ycombinator.com/item?id=38465916"&gt;list of supported platforms&lt;/a&gt; in a comment on Hacker News).&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Download the 4.29GB &lt;code&gt;llava-v1.5-7b-q4.llamafile&lt;/code&gt; file &lt;a href="https://huggingface.co/jartine/llava-v1.5-7B-GGUF/tree/main"&gt;from Justine's repository on Hugging Face&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;curl -LO https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4.llamafile&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Make that binary executable, by running this in a terminal:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;chmod 755 llava-v1.5-7b-q4.llamafile&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Run your new executable, which will start a web server on port 8080:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;./llava-v1.5-7b-q4.llamafile&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Navigate to &lt;code&gt;http://127.0.0.1:8080/&lt;/code&gt; to start interacting with the model in your browser.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;That's all there is to it. On my M2 Mac it runs at around 55 tokens a second, which is &lt;em&gt;really&lt;/em&gt; fast. And it can analyze images - here's what I got when I uploaded a photograph and asked "Describe this plant":&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2023/llamafile.jpg" alt="Screenshot. llama.cpp - then a photo I took of a plant

User: Describe this plant

Llama: The image features a large, green plant with numerous thin branches and leaves. Among the many stems of this plant, there is an orange flower visible near its center. This beautifully decorated plant stands out in the scene due to its vibrant colors and intricate structure.

18ms per token, 54.24 tokens per second
Powered by llama.cpp, ggml.ai, and llamafile" style="max-width: 100%" /&gt;&lt;/p&gt;
&lt;h4 id="llamafile-how-this-works"&gt;How this works&lt;/h4&gt;
&lt;p&gt;There are a number of different components working together here to make this work.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The LLaVA 1.5 model by Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee is &lt;a href="https://arxiv.org/abs/2310.03744"&gt;described in this paper&lt;/a&gt;, with further details on &lt;a href="https://llava-vl.github.io/"&gt;llava-vl.github.io&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The models are executed using &lt;a href="https://github.com/ggerganov/llama.cpp"&gt;llama.cpp&lt;/a&gt;, and in the above demo also use the &lt;code&gt;llama.cpp&lt;/code&gt; server example to provide the UI.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://justine.lol/cosmopolitan/index.html"&gt;Cosmopolitan Libc&lt;/a&gt; is the magic that makes one binary work on multiple platforms. I wrote more about that in a TIL a few months ago, &lt;a href="https://til.simonwillison.net/cosmopolitan/ecosystem"&gt;Catching up with the Cosmopolitan ecosystem&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="llamafile-trying-other-models"&gt;Trying more models&lt;/h4&gt;
&lt;p&gt;The &lt;a href="https://github.com/mozilla-Ocho/llamafile"&gt;llamafile README&lt;/a&gt; currently links to binaries for &lt;code&gt;Mistral-7B-Instruct&lt;/code&gt;, &lt;code&gt;LLaVA 1.5&lt;/code&gt; and &lt;code&gt;WizardCoder-Python-13B&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;You can also download a much smaller &lt;code&gt;llamafile&lt;/code&gt; binary from &lt;a href="https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.1"&gt;their releases&lt;/a&gt;, which can then execute any model that has been compiled to GGUF format:&lt;/p&gt;
&lt;p&gt;I grabbed &lt;code&gt;llamafile-server-0.1&lt;/code&gt; (4.45MB) like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;curl -LO https://github.com/Mozilla-Ocho/llamafile/releases/download/0.1/llamafile-server-0.1
chmod 755 llamafile-server-0.1&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then ran it against a 13GB &lt;code&gt;llama-2-13b.Q8_0.gguf&lt;/code&gt; file I had &lt;a href="https://huggingface.co/TheBloke/Llama-2-13B-GGUF/tree/main"&gt;previously downloaded&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;./llamafile-server-0.1 -m llama-2-13b.Q8_0.gguf&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This gave me the same interface at &lt;code&gt;http://127.0.0.1:8080/&lt;/code&gt; (without the image upload) and let me talk with the model at 24 tokens per second.&lt;/p&gt;
&lt;h4 id="llamafile-one-file"&gt;One file is all you need&lt;/h4&gt;
&lt;p&gt;I think my favourite thing about llamafile is what it represents. This is a single binary file which you can download and then use, forever, on (almost) any computer.&lt;/p&gt;

&lt;p&gt;You don't need a network connection, and you don't need to keep track of more than one file.&lt;/p&gt;

&lt;p&gt;Stick that file on a USB stick and stash it in a drawer as insurance against a future apocalypse. You'll never be without a language model ever again.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/mozilla"&gt;mozilla&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cosmopolitan"&gt;cosmopolitan&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mistral"&gt;mistral&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llamafile"&gt;llamafile&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/justine-tunney"&gt;justine-tunney&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama-cpp"&gt;llama-cpp&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="mozilla"/><category term="ai"/><category term="generative-ai"/><category term="cosmopolitan"/><category term="llama"/><category term="local-llms"/><category term="llms"/><category term="mistral"/><category term="llamafile"/><category term="justine-tunney"/><category term="llama-cpp"/></entry></feed>