<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: jina</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/jina.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-03-04T17:25:16+00:00</updated><author><name>Simon Willison</name></author><entry><title>A Practical Guide to Implementing DeepSearch / DeepResearch</title><link href="https://simonwillison.net/2025/Mar/4/deepsearch-deepresearch/#atom-tag" rel="alternate"/><published>2025-03-04T17:25:16+00:00</published><updated>2025-03-04T17:25:16+00:00</updated><id>https://simonwillison.net/2025/Mar/4/deepsearch-deepresearch/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://jina.ai/news/a-practical-guide-to-implementing-deepsearch-deepresearch/"&gt;A Practical Guide to Implementing DeepSearch / DeepResearch&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I really like the definitions Han Xiao from Jina AI proposes for the terms DeepSearch and DeepResearch in this piece:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;DeepSearch&lt;/strong&gt; runs through an iterative loop of searching, reading, and reasoning until it finds the optimal answer.  [...]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;DeepResearch&lt;/strong&gt; builds upon DeepSearch by adding a structured framework for generating long research reports.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've recently found myself cooling a little on the classic RAG pattern of finding relevant documents and dumping them into the context for a single call to an LLM.&lt;/p&gt;
&lt;p&gt;I think this definition of DeepSearch helps explain why. RAG is about answering questions that fall outside of the knowledge baked into a model. The DeepSearch pattern offers a tools-based alternative to classic RAG: we give the model extra tools for running multiple searches (which could be vector-based, or FTS, or even systems like ripgrep) and run it for several steps in a loop to try to find an answer.&lt;/p&gt;
&lt;p&gt;I think DeepSearch is a lot more interesting than DeepResearch, which feels to me more like a presentation layer thing. Pulling together the results from multiple searches into a "report" looks more impressive, but I &lt;a href="https://simonwillison.net/2025/Feb/25/deep-research-system-card/"&gt;still worry&lt;/a&gt; that the report format provides a misleading impression of the quality of the "research" that took place.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rag"&gt;rag&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-tool-use"&gt;llm-tool-use&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jina"&gt;jina&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-search"&gt;ai-assisted-search&lt;/a&gt;&lt;/p&gt;



</summary><category term="search"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="rag"/><category term="llm-tool-use"/><category term="jina"/><category term="ai-assisted-search"/></entry><entry><title>q and qv zsh functions for asking questions of websites and YouTube videos with LLM</title><link href="https://simonwillison.net/2024/Dec/19/q-and-qv-zsh-functions/#atom-tag" rel="alternate"/><published>2024-12-19T15:42:34+00:00</published><updated>2024-12-19T15:42:34+00:00</updated><id>https://simonwillison.net/2024/Dec/19/q-and-qv-zsh-functions/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/davidgasquez/dotfiles/blob/bb9df4a369dbaef95ca0c35642de491c7dd41269/shell/zshrc#L50-L99"&gt;q and qv zsh functions for asking questions of websites and YouTube videos with LLM&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Spotted these in David Gasquez's &lt;code&gt;zshrc&lt;/code&gt; dotfiles: two shell functions that use my &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; tool to answer questions about a website or YouTube video.&lt;/p&gt;
&lt;p&gt;Here's how to ask a question of a website:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;q https://simonwillison.net/ 'What has Simon written about recently?'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I got back:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Recently, Simon Willison has written about various topics including:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Building Python Tools&lt;/strong&gt; - Exploring one-shot applications using Claude and dependency management with &lt;code&gt;uv&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Modern Java Usage&lt;/strong&gt; - Discussing recent developments in Java that simplify coding.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GitHub Copilot Updates&lt;/strong&gt; - New free tier and features in GitHub Copilot for Vue and VS Code.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI Engagement on Bluesky&lt;/strong&gt; - Investigating the use of bots to create artificially polite disagreements.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenAI WebRTC Audio&lt;/strong&gt; - Demonstrating a new API for real-time audio conversation with models.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;
&lt;p&gt;It works by constructing a &lt;a href="https://simonwillison.net/2024/Jun/16/jina-ai-reader/"&gt;Jina Reader URL&lt;/a&gt; to convert that URL to Markdown, then piping that content into LLM along with the question.&lt;/p&gt;
&lt;p&gt;The YouTube one is even more fun:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;qv 'https://www.youtube.com/watch?v=uRuLgar5XZw' 'what does Simon say about open source?'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It said (about &lt;a href="https://www.youtube.com/watch?v=uRuLgar5XZw"&gt;this 72 minute video&lt;/a&gt;):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Simon emphasizes that open source has significantly increased productivity in software development. He points out that before open source, developers often had to recreate existing solutions or purchase proprietary software, which often limited customization. The availability of open source projects has made it easier to find and utilize existing code, which he believes is one of the primary reasons for more efficient software development today.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The secret sauce behind that one is the way it uses &lt;code&gt;yt-dlp&lt;/code&gt; to extract just the subtitles for the video:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;local subtitle_url=$(yt-dlp -q --skip-download --convert-subs srt --write-sub --sub-langs "en" --write-auto-sub --print "requested_subtitles.en.url" "$url")
local content=$(curl -s "$subtitle_url" | sed '/^$/d' | grep -v '^[0-9]*$' | grep -v '\--&amp;gt;' | sed 's/&amp;lt;[^&amp;gt;]*&amp;gt;//g' | tr '\n' ' ')
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That first line retrieves a URL to the subtitles in WEBVTT format - I &lt;a href="https://gist.github.com/simonw/7f07837cf8adcee23fd5cd5394170f27"&gt;saved a copy of that here&lt;/a&gt;. The second line then uses &lt;code&gt;curl&lt;/code&gt; to fetch them, then &lt;code&gt;sed&lt;/code&gt; and &lt;code&gt;grep&lt;/code&gt; to remove the timestamp information, producing &lt;a href="https://gist.github.com/simonw/7f07837cf8adcee23fd5cd5394170f27?permalink_comment_id=5350044#gistcomment-5350044"&gt;this&lt;/a&gt;.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://davidgasquez.com/useful-llm-tools-2024/"&gt;Useful LLM tools (2024 Edition)&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/youtube"&gt;youtube&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/zsh"&gt;zsh&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jina"&gt;jina&lt;/a&gt;&lt;/p&gt;



</summary><category term="youtube"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="zsh"/><category term="jina"/></entry><entry><title>docs.jina.ai - the Jina meta-prompt</title><link href="https://simonwillison.net/2024/Oct/30/jina-meta-prompt/#atom-tag" rel="alternate"/><published>2024-10-30T17:07:42+00:00</published><updated>2024-10-30T17:07:42+00:00</updated><id>https://simonwillison.net/2024/Oct/30/jina-meta-prompt/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://docs.jina.ai/"&gt;docs.jina.ai - the Jina meta-prompt&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
From &lt;a href="https://twitter.com/jinaai_/status/1851651702635847729"&gt;Jina AI on Twitter&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;curl docs.jina.ai&lt;/code&gt; - This is our &lt;strong&gt;Meta-Prompt&lt;/strong&gt;. It allows LLMs to understand our Reader, Embeddings, Reranker, and Classifier APIs for improved codegen. Using the meta-prompt is straightforward. Just copy the prompt into your preferred LLM interface like ChatGPT, Claude, or whatever works for you, add your instructions, and you're set.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The page is served using content negotiation. If you hit it with &lt;code&gt;curl&lt;/code&gt; you get plain text, but a browser with &lt;code&gt;text/html&lt;/code&gt; in the &lt;code&gt;accept:&lt;/code&gt; header gets an explanation along with a convenient copy to clipboard button.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/jina-docs.jpg" alt="Screenshot of an API documentation page for Jina AI with warning message, access instructions, and code sample. Contains text: Note: This content is specifically designed for LLMs and not intended for human reading. For human-readable content, please visit Jina AI. For LLMs/programmatic access, you can fetch this content directly: curl docs.jina.ai/v2 # or wget docs.jina.ai/v2 # or fetch docs.jina.ai/v2 You only see this as a HTML when you access docs.jina.ai via browser. If you access it via code/program, you will get a text/plain response as below. You are an AI engineer designed to help users use Jina AI Search Foundation API's for their specific use case. # Core principles..." style="max-width:90%;" class="blogmark-image"&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jina"&gt;jina&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="jina"/></entry><entry><title>My Jina Reader tool</title><link href="https://simonwillison.net/2024/Oct/14/my-jina-reader-tool/#atom-tag" rel="alternate"/><published>2024-10-14T16:47:56+00:00</published><updated>2024-10-14T16:47:56+00:00</updated><id>https://simonwillison.net/2024/Oct/14/my-jina-reader-tool/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/jina-reader"&gt;My Jina Reader tool&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I wanted to feed the &lt;a href="https://developers.cloudflare.com/durable-objects/api/storage-api/"&gt;Cloudflare Durable Objects SQLite&lt;/a&gt; documentation into Claude, but I was on my iPhone so copying and pasting was inconvenient. Jina offer a &lt;a href="https://jina.ai/reader/"&gt;Reader API&lt;/a&gt; which can turn any URL into LLM-friendly Markdown and it turns out it supports CORS, so I &lt;a href="https://gist.github.com/simonw/053b271e023ed1b834529e2fbd0efc3b"&gt;got Claude to build me this tool&lt;/a&gt; (&lt;a href="https://gist.github.com/simonw/e56d55e6a87a547faac7070eb912b32d"&gt;second iteration&lt;/a&gt;, &lt;a href="https://gist.github.com/simonw/e0a841a580038d15c7bf22bd7d104ce3"&gt;third iteration&lt;/a&gt;, &lt;a href="https://github.com/simonw/tools/blob/main/jina-reader.html"&gt;final source code&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Paste in a URL to get the Jina Markdown version, along with an all important "Copy to clipboard" button.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/jina-reader.jpg" class="blogmark-image" style="max-width: 90%"&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/markdown"&gt;markdown&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-3-5-sonnet"&gt;claude-3-5-sonnet&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cors"&gt;cors&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jina"&gt;jina&lt;/a&gt;&lt;/p&gt;



</summary><category term="projects"/><category term="markdown"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="claude"/><category term="claude-3-5-sonnet"/><category term="cors"/><category term="jina"/></entry><entry><title>Bridging Language Gaps in Multilingual Embeddings via Contrastive Learning</title><link href="https://simonwillison.net/2024/Oct/10/bridging-language-gaps-in-multilingual-embeddings-via-contrastiv/#atom-tag" rel="alternate"/><published>2024-10-10T16:00:35+00:00</published><updated>2024-10-10T16:00:35+00:00</updated><id>https://simonwillison.net/2024/Oct/10/bridging-language-gaps-in-multilingual-embeddings-via-contrastiv/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://jina.ai/news/bridging-language-gaps-in-multilingual-embeddings-via-contrastive-learning/"&gt;Bridging Language Gaps in Multilingual Embeddings via Contrastive Learning&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Most text embeddings models suffer from a "language gap", where phrases in different languages with the same semantic meaning end up with embedding vectors that aren't clustered together.&lt;/p&gt;
&lt;p&gt;Jina claim their new &lt;a href="https://jina.ai/news/jina-embeddings-v3-a-frontier-multilingual-embedding-model"&gt;jina-embeddings-v3&lt;/a&gt; (CC BY-NC 4.0, which means you need to license it for commercial use if you're not using &lt;a href="https://jina.ai/embeddings/"&gt;their API&lt;/a&gt;) is much better on this front, thanks to a training technique called "contrastive learning".&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There are 30 languages represented in our contrastive learning dataset, but 97% of pairs and triplets are in just one language, with only 3% involving cross-language pairs or triplets. But this 3% is enough to produce a dramatic result: Embeddings show very little language clustering and semantically similar texts produce close embeddings regardless of their language&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;img alt="Scatter plot diagram, titled Desired Outcome: Clustering by Meaning. My dog is blue and Mein Hund ist blau are located near to each other, and so are Meine Katze ist rot and My cat is red" src="https://static.simonwillison.net/static/2024/jina-multi-language.png" /&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/JinaAI_/status/1844401388878762209"&gt;@JinaAI_&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/embeddings"&gt;embeddings&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jina"&gt;jina&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="ai"/><category term="embeddings"/><category term="jina"/></entry><entry><title>Jina AI Reader</title><link href="https://simonwillison.net/2024/Jun/16/jina-ai-reader/#atom-tag" rel="alternate"/><published>2024-06-16T19:33:58+00:00</published><updated>2024-06-16T19:33:58+00:00</updated><id>https://simonwillison.net/2024/Jun/16/jina-ai-reader/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://jina.ai/reader/"&gt;Jina AI Reader&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Jina AI provide a number of different AI-related platform products, including an excellent &lt;a href="https://huggingface.co/collections/jinaai/jina-embeddings-v2-65708e3ec4993b8fb968e744"&gt;family of embedding models&lt;/a&gt;, but one of their most instantly useful is Jina Reader, an API for turning any URL into Markdown content suitable for piping into an LLM.&lt;/p&gt;
&lt;p&gt;Add &lt;code&gt;r.jina.ai&lt;/code&gt; to the front of a URL to get back Markdown of that page, for example &lt;a href="https://r.jina.ai/https://simonwillison.net/2024/Jun/16/jina-ai-reader/"&gt;https://r.jina.ai/https://simonwillison.net/2024/Jun/16/jina-ai-reader/&lt;/a&gt; - in addition to converting the content to Markdown it also does a decent job of extracting just the content and ignoring the surrounding navigation.&lt;/p&gt;
&lt;p&gt;The API is free but rate-limited (presumably by IP) to 20 requests per minute without an API key or 200 request per minute with a free API key, and you can pay to increase your allowance beyond that.&lt;/p&gt;
&lt;p&gt;The Apache 2 licensed source code for the hosted service is &lt;a href="https://github.com/jina-ai/reader"&gt;on GitHub&lt;/a&gt; - it's written in TypeScript and &lt;a href="https://github.com/jina-ai/reader/blob/main/backend/functions/src/services/puppeteer.ts"&gt;uses Puppeteer&lt;/a&gt; to run &lt;a href="https://github.com/mozilla/readability"&gt;Readabiliy.js&lt;/a&gt; and &lt;a href="https://github.com/mixmark-io/turndown"&gt;Turndown&lt;/a&gt; against the scraped page.&lt;/p&gt;
&lt;p&gt;It can also handle PDFs, which have their contents extracted &lt;a href="https://github.com/jina-ai/reader/blob/main/backend/functions/src/services/pdf-extract.ts"&gt;using PDF.js&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There's also a search feature, &lt;code&gt;s.jina.ai/search+term+goes+here&lt;/code&gt;, which &lt;a href="https://github.com/jina-ai/reader/blob/ed80c9a4a2c340fb7c874347d3f25501e42ca251/backend/functions/src/services/brave-search.ts"&gt;uses the Brave Search API&lt;/a&gt;.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/apis"&gt;apis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/markdown"&gt;markdown&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/puppeteer"&gt;puppeteer&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jina"&gt;jina&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/brave"&gt;brave&lt;/a&gt;&lt;/p&gt;



</summary><category term="apis"/><category term="markdown"/><category term="ai"/><category term="puppeteer"/><category term="llms"/><category term="jina"/><category term="brave"/></entry><entry><title>Exploring Hacker News by mapping and analyzing 40 million posts and comments for fun</title><link href="https://simonwillison.net/2024/May/10/exploring-hacker-news-by-mapping-and-analyzing-40-million-posts/#atom-tag" rel="alternate"/><published>2024-05-10T16:42:55+00:00</published><updated>2024-05-10T16:42:55+00:00</updated><id>https://simonwillison.net/2024/May/10/exploring-hacker-news-by-mapping-and-analyzing-40-million-posts/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.wilsonl.in/hackerverse/"&gt;Exploring Hacker News by mapping and analyzing 40 million posts and comments for fun&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
A real tour de force of data engineering. Wilson Lin fetched 40 million posts and comments from the Hacker News API (using Node.js with a custom multi-process worker pool) and then ran them all through the &lt;code&gt;BGE-M3&lt;/code&gt; embedding model using RunPod, which let him fire up ~150 GPU instances to get the whole run done in a few hours, using a custom RocksDB and Rust queue he built to save on Amazon SQS costs.&lt;/p&gt;
&lt;p&gt;Then he crawled 4 million linked pages, embedded &lt;em&gt;that&lt;/em&gt; content using the faster and cheaper &lt;code&gt;jina-embeddings-v2-small-en&lt;/code&gt; model, ran UMAP dimensionality reduction to render a 2D map and did a whole lot of follow-on work to identify topic areas and make the map look good.&lt;/p&gt;
&lt;p&gt;That's not even half the project - Wilson built several interactive features on top of the resulting data, and experimented with custom rendering techniques on top of canvas to get everything to render quickly.&lt;/p&gt;
&lt;p&gt;There's so much in here, and both the code and data (multiple GBs of arrow files) are available if you want to dig in and try some of this out for yourself.&lt;/p&gt;
&lt;p&gt;In the Hacker News comments Wilson shares that the total cost of the project was a couple of hundred dollars.&lt;/p&gt;
&lt;p&gt;One tiny detail I particularly enjoyed - unrelated to the embeddings - was this trick for testing which edge location is closest to a user using JavaScript:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;const edge = await Promise.race(
  EDGES.map(async (edge) =&amp;gt; {
    // Run a few times to avoid potential cold start biases.
    for (let i = 0; i &amp;lt; 3; i++) {
      await fetch(`https://${edge}.edge-hndr.wilsonl.in/healthz`);
    }
    return edge;
  }),
);
&lt;/code&gt;&lt;/pre&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=40307519"&gt;Show HN&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/hacker-news"&gt;hacker-news&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/embeddings"&gt;embeddings&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jina"&gt;jina&lt;/a&gt;&lt;/p&gt;



</summary><category term="hacker-news"/><category term="embeddings"/><category term="jina"/></entry><entry><title>Execute Jina embeddings with a CLI using llm-embed-jina</title><link href="https://simonwillison.net/2023/Oct/26/llm-embed-jina/#atom-tag" rel="alternate"/><published>2023-10-26T03:47:08+00:00</published><updated>2023-10-26T03:47:08+00:00</updated><id>https://simonwillison.net/2023/Oct/26/llm-embed-jina/#atom-tag</id><summary type="html">
    &lt;p&gt;Berlin-based Jina AI &lt;a href="https://jina.ai/news/jina-ai-launches-worlds-first-open-source-8k-text-embedding-rivaling-openai/"&gt;just released a new family of embedding models&lt;/a&gt;, boasting that they are the "world's first open-source 8K text embedding model" and that they rival OpenAI's &lt;code&gt;text-embedding-ada-002&lt;/code&gt; in quality.&lt;/p&gt;
&lt;p&gt;I wrote about embeddings extensively &lt;a href="https://simonwillison.net/2023/Oct/23/embeddings/"&gt;the other day&lt;/a&gt; - if you're not familiar with what they are and what you can do with them I suggest reading that first.&lt;/p&gt;
&lt;p&gt;This evening I built and released a new plugin for my &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; tool which adds support for Jina's new embedding models.&lt;/p&gt;
&lt;h4 id="trying-out-llm-embed-jina"&gt;Trying out llm-embed-jina&lt;/h4&gt;
&lt;p&gt;The plugin is called &lt;a href="https://github.com/simonw/llm-embed-jina"&gt;llm-embed-jina&lt;/a&gt;. Here's the quickest way to get started with it:&lt;/p&gt;
&lt;p&gt;First, &lt;a href="https://llm.datasette.io/en/stable/setup.html"&gt;install LLM&lt;/a&gt; if you haven't already. You can use &lt;a href="https://pypa.github.io/pipx/"&gt;pipx&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;pipx install llm&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Or &lt;code&gt;pip&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;pip install llm&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Unfortunately installing LLM using Homebrew doesn't currently work with this plugin as PyTorch has not yet been released for Python 3.12 - details &lt;a href="https://github.com/simonw/llm/issues/315#issuecomment-1783661583"&gt;in this issue&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now you can install the &lt;code&gt;llm-embed-jina&lt;/code&gt; plugin:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-embed-jina&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;llm install&lt;/code&gt; command ensures it gets installed in the correct virtual environment, no matter how you installed LLM itself.&lt;/p&gt;
&lt;p&gt;Run this command to check that it added the models:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm embed-models&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You should see output like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ada-002 (aliases: ada, oai)
jina-embeddings-v2-small-en
jina-embeddings-v2-base-en
jina-embeddings-v2-large-en
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;jina-emebddings-v2-large-en&lt;/code&gt; model isn't available yet, but should work as soon as Jina release it. I expect it will show up at &lt;a href="https://huggingface.co/jinaai/jina-embeddings-v2-large-en"&gt;huggingface.co/jinaai/jina-embeddings-v2-large-en&lt;/a&gt; (currently a 404).&lt;/p&gt;
&lt;p&gt;Now you can run one of the models. The &lt;code&gt;-small-en&lt;/code&gt; model is a good starting point, it's only a 65MB download - the &lt;code&gt;-base-en&lt;/code&gt; model is 275MB.&lt;/p&gt;
&lt;p&gt;The model will download the first time you try to use it. Run this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm embed -m jina-embeddings-v2-small-en -c &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Hello world&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will return a JSON array of 512 floating point numbers - the embedding vector for the string "Hello world".&lt;/p&gt;
&lt;p&gt;Embeddings are much more interesting if you store them somewhere and then use them to run comparisons. The &lt;a href="https://llm.datasette.io/en/stable/embeddings/cli.html#llm-embed-multi"&gt;llm embed-multi&lt;/a&gt; command can do that.&lt;/p&gt;
&lt;p&gt;Change directory to a folder that you know contains &lt;code&gt;README.md&lt;/code&gt; files (anything with a &lt;code&gt;node_modules&lt;/code&gt; folder will do) and run this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm embed-multi readmes \
    -m jina-embeddings-v2-small-en \
    --files &lt;span class="pl-c1"&gt;.&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;**/README.md&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; \
    --database readmes.db&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will create a SQLite database called &lt;code&gt;readmes.db&lt;/code&gt;, then search for every &lt;code&gt;README.md&lt;/code&gt; file in the current directory and all subdirectories, embed the content of each one and store the results in that database.&lt;/p&gt;
&lt;p&gt;Those embeddings will live in a collection called &lt;code&gt;readmes&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If you leave off the &lt;code&gt;--database readmes.db&lt;/code&gt; option the collections will be stored in a default SQLite database tucked away somewhere on your system.&lt;/p&gt;
&lt;p&gt;Having done this, you can run semantic similarity searches against the new collection like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm similar readmes -d readmes.db -c &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;utility functions&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When I ran that in my &lt;a href="https://github.com/simonw/hmb-map"&gt;hmb-map&lt;/a&gt; directory I got these:&lt;/p&gt;
&lt;div class="highlight highlight-source-json"&gt;&lt;pre&gt;{&lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;node_modules/@maplibre/maplibre-gl-style-spec/src/feature_filter/README.md&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-ent"&gt;"score"&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.7802185991017785&lt;/span&gt;, &lt;span class="pl-ent"&gt;"content"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;, &lt;span class="pl-ent"&gt;"metadata"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;}
{&lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;node_modules/kind-of/README.md&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-ent"&gt;"score"&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.7725600920927725&lt;/span&gt;, &lt;span class="pl-ent"&gt;"content"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;, &lt;span class="pl-ent"&gt;"metadata"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;}
{&lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;node_modules/which/README.md&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-ent"&gt;"score"&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.7645426557095619&lt;/span&gt;, &lt;span class="pl-ent"&gt;"content"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;, &lt;span class="pl-ent"&gt;"metadata"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;}
{&lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;node_modules/@mapbox/point-geometry/README.md&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-ent"&gt;"score"&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.7636548563018607&lt;/span&gt;, &lt;span class="pl-ent"&gt;"content"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;, &lt;span class="pl-ent"&gt;"metadata"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;}
{&lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;node_modules/esbuild/README.md&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-ent"&gt;"score"&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.7633325127194481&lt;/span&gt;, &lt;span class="pl-ent"&gt;"content"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;, &lt;span class="pl-ent"&gt;"metadata"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;}
{&lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;node_modules/maplibre-gl/src/shaders/README.md&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-ent"&gt;"score"&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.7614428292518743&lt;/span&gt;, &lt;span class="pl-ent"&gt;"content"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;, &lt;span class="pl-ent"&gt;"metadata"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;}
{&lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;node_modules/minimist/README.md&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-ent"&gt;"score"&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.7581314986768929&lt;/span&gt;, &lt;span class="pl-ent"&gt;"content"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;, &lt;span class="pl-ent"&gt;"metadata"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;}
{&lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;node_modules/split-string/README.md&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-ent"&gt;"score"&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.7563253351715924&lt;/span&gt;, &lt;span class="pl-ent"&gt;"content"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;, &lt;span class="pl-ent"&gt;"metadata"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;}
{&lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;node_modules/assign-symbols/README.md&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-ent"&gt;"score"&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.7555915219064293&lt;/span&gt;, &lt;span class="pl-ent"&gt;"content"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;, &lt;span class="pl-ent"&gt;"metadata"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;}
{&lt;span class="pl-ent"&gt;"id"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;node_modules/maplibre-gl/build/README.md&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;, &lt;span class="pl-ent"&gt;"score"&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.754027372081506&lt;/span&gt;, &lt;span class="pl-ent"&gt;"content"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;, &lt;span class="pl-ent"&gt;"metadata"&lt;/span&gt;: &lt;span class="pl-c1"&gt;null&lt;/span&gt;}&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;These are the top ten results by similarity to the string I entered.&lt;/p&gt;
&lt;p&gt;You can also pass in the ID of an item in the collection to see other similar items:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm similar readmes -d readmes.db node_modules/esbuild/README.md &lt;span class="pl-k"&gt;|&lt;/span&gt; jq .id&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I piped it through &lt;code&gt;| jq .id&lt;/code&gt; to get back just the IDs. I got this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;"node_modules/@esbuild/darwin-arm64/README.md"
"node_modules/rollup/README.md"
"node_modules/assign-symbols/README.md"
"node_modules/split-string/node_modules/extend-shallow/README.md"
"node_modules/isobject/README.md"
"node_modules/maplibre-gl/build/README.md"
"node_modules/vite/README.md"
"node_modules/nanoid/README.md"
"node_modules/@mapbox/tiny-sdf/README.md"
"node_modules/split-string/node_modules/is-extendable/README.md"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;See the &lt;a href="https://llm.datasette.io/en/stable/embeddings/index.html"&gt;LLM embeddings documentation&lt;/a&gt; for more details on things you can do with this tool.&lt;/p&gt;
&lt;h4 id="how-i-built-the-plugin"&gt;How I built the plugin&lt;/h4&gt;
&lt;p&gt;I built the first version of this plugin in about 15 minutes. It took another hour to iron out a couple of bugs.&lt;/p&gt;
&lt;p&gt;I started with &lt;a href="https://github.com/simonw/llm-plugin"&gt;this cookiecutter template&lt;/a&gt;, followed by pasting in the recipe in the LLM documentation on &lt;a href="https://llm.datasette.io/en/stable/embeddings/writing-plugins.html"&gt;writing embedding model plugins&lt;/a&gt; combined with some example code that Jina provided &lt;a href="https://huggingface.co/jinaai/jina-embeddings-v2-small-en#usage"&gt;in their model release&lt;/a&gt;. Here's their code:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;transformers&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;AutoModel&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;numpy&lt;/span&gt;.&lt;span class="pl-s1"&gt;linalg&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;norm&lt;/span&gt;

&lt;span class="pl-s1"&gt;cos_sim&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-k"&gt;lambda&lt;/span&gt; &lt;span class="pl-s1"&gt;a&lt;/span&gt;,&lt;span class="pl-s1"&gt;b&lt;/span&gt;: (&lt;span class="pl-s1"&gt;a&lt;/span&gt; @ &lt;span class="pl-s1"&gt;b&lt;/span&gt;.&lt;span class="pl-v"&gt;T&lt;/span&gt;) &lt;span class="pl-c1"&gt;/&lt;/span&gt; (&lt;span class="pl-en"&gt;norm&lt;/span&gt;(&lt;span class="pl-s1"&gt;a&lt;/span&gt;)&lt;span class="pl-c1"&gt;*&lt;/span&gt;&lt;span class="pl-en"&gt;norm&lt;/span&gt;(&lt;span class="pl-s1"&gt;b&lt;/span&gt;))
&lt;span class="pl-s1"&gt;model&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-v"&gt;AutoModel&lt;/span&gt;.&lt;span class="pl-en"&gt;from_pretrained&lt;/span&gt;(&lt;span class="pl-s"&gt;'jinaai/jina-embeddings-v2-small-en'&lt;/span&gt;, &lt;span class="pl-s1"&gt;trust_remote_code&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;True&lt;/span&gt;) &lt;span class="pl-c"&gt;# trust_remote_code is needed to use the encode method&lt;/span&gt;
&lt;span class="pl-s1"&gt;embeddings&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;model&lt;/span&gt;.&lt;span class="pl-en"&gt;encode&lt;/span&gt;([&lt;span class="pl-s"&gt;'How is the weather today?'&lt;/span&gt;, &lt;span class="pl-s"&gt;'What is the current weather like today?'&lt;/span&gt;])
&lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-en"&gt;cos_sim&lt;/span&gt;(&lt;span class="pl-s1"&gt;embeddings&lt;/span&gt;[&lt;span class="pl-c1"&gt;0&lt;/span&gt;], &lt;span class="pl-s1"&gt;embeddings&lt;/span&gt;[&lt;span class="pl-c1"&gt;1&lt;/span&gt;]))&lt;/pre&gt;
&lt;p&gt;That &lt;code&gt;numpy&lt;/code&gt; and &lt;code&gt;cos_sim&lt;/code&gt; bit isn't necessary, so I ignored that.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://github.com/simonw/llm-embed-jina/blob/9cbeff3f72318ea5972310235efc1262cc72f960/llm_embed_jina.py"&gt;first working version&lt;/a&gt; of the plugin was a file called &lt;code&gt;llm_embed_jina.py&lt;/code&gt; that looked like this:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;llm&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;transformers&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;AutoModel&lt;/span&gt;


&lt;span class="pl-en"&gt;@&lt;span class="pl-s1"&gt;llm&lt;/span&gt;.&lt;span class="pl-s1"&gt;hookimpl&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;register_embedding_models&lt;/span&gt;(&lt;span class="pl-s1"&gt;register&lt;/span&gt;):
    &lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-s1"&gt;model_id&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; (
        &lt;span class="pl-s"&gt;"jina-embeddings-v2-small-en"&lt;/span&gt;,
        &lt;span class="pl-s"&gt;"jina-embeddings-v2-base-en"&lt;/span&gt;,
        &lt;span class="pl-s"&gt;"jina-embeddings-v2-large-en"&lt;/span&gt;,
    ):
        &lt;span class="pl-en"&gt;register&lt;/span&gt;(&lt;span class="pl-v"&gt;JinaEmbeddingModel&lt;/span&gt;(&lt;span class="pl-s1"&gt;model_id&lt;/span&gt;))


&lt;span class="pl-k"&gt;class&lt;/span&gt; &lt;span class="pl-v"&gt;JinaEmbeddingModel&lt;/span&gt;(&lt;span class="pl-s1"&gt;llm&lt;/span&gt;.&lt;span class="pl-v"&gt;EmbeddingModel&lt;/span&gt;):
    &lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;__init__&lt;/span&gt;(&lt;span class="pl-s1"&gt;self&lt;/span&gt;, &lt;span class="pl-s1"&gt;model_id&lt;/span&gt;):
        &lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-s1"&gt;model_id&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;model_id&lt;/span&gt;
        &lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-s1"&gt;_model&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-c1"&gt;None&lt;/span&gt;

    &lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;embed_batch&lt;/span&gt;(&lt;span class="pl-s1"&gt;self&lt;/span&gt;, &lt;span class="pl-s1"&gt;texts&lt;/span&gt;):
        &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-s1"&gt;_model&lt;/span&gt; &lt;span class="pl-c1"&gt;is&lt;/span&gt; &lt;span class="pl-c1"&gt;None&lt;/span&gt;:
            &lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-s1"&gt;_model&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-v"&gt;AutoModel&lt;/span&gt;.&lt;span class="pl-en"&gt;from_pretrained&lt;/span&gt;(
                &lt;span class="pl-s"&gt;"jinaai/{}"&lt;/span&gt;.&lt;span class="pl-en"&gt;format&lt;/span&gt;(&lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-s1"&gt;model_id&lt;/span&gt;), &lt;span class="pl-s1"&gt;trust_remote_code&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;True&lt;/span&gt;
            )
        &lt;span class="pl-s1"&gt;results&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-s1"&gt;_model&lt;/span&gt;.&lt;span class="pl-en"&gt;encode&lt;/span&gt;(&lt;span class="pl-s1"&gt;texts&lt;/span&gt;)
        &lt;span class="pl-k"&gt;return&lt;/span&gt; (&lt;span class="pl-en"&gt;list&lt;/span&gt;(&lt;span class="pl-en"&gt;map&lt;/span&gt;(&lt;span class="pl-s1"&gt;float&lt;/span&gt;, &lt;span class="pl-s1"&gt;result&lt;/span&gt;)) &lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-s1"&gt;result&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;results&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;There's really not a lot to it.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;register_embedding_models()&lt;/code&gt; function is a &lt;a href="https://llm.datasette.io/en/stable/plugins/plugin-hooks.html"&gt;plugin hook&lt;/a&gt; that LLM calls to register all of the embedding models.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;JinaEmbeddingModel&lt;/code&gt; is a subclass of &lt;code&gt;llm.EmbeddingModel&lt;/code&gt;. It just needs to implement two things: a constructor and that &lt;code&gt;embed_batch(self, texts)&lt;/code&gt; method.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;AutoModel.from_pretrained()&lt;/code&gt; is provided by &lt;a href="https://huggingface.co/docs/transformers/index"&gt;Hugging Face Transformers&lt;/a&gt;. It downloads and caches the model the first time you call it.&lt;/p&gt;
&lt;p&gt;The model returns numpy arrays, but LLM wants a regular Python list of floats - that's what that last &lt;code&gt;return&lt;/code&gt; line is doing.&lt;/p&gt;
&lt;p&gt;I found a couple of bugs with this. The model &lt;a href="https://github.com/simonw/llm-embed-jina/issues/3"&gt;didn't like&lt;/a&gt; having &lt;code&gt;.encode(texts)&lt;/code&gt; called with a generator, so I needed to convert that into a list. Then later I found that text longer than 8192 characters could &lt;a href="https://github.com/simonw/llm-embed-jina/issues/4"&gt;cause the model to hang&lt;/a&gt; in some situations, so I added my own truncated.&lt;/p&gt;
&lt;p&gt;The current version (0.1.2) of the plugin, with fixes for both of those issues, looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;llm&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;transformers&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;AutoModel&lt;/span&gt;

&lt;span class="pl-v"&gt;MAX_LENGTH&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-c1"&gt;8192&lt;/span&gt;


&lt;span class="pl-en"&gt;@&lt;span class="pl-s1"&gt;llm&lt;/span&gt;.&lt;span class="pl-s1"&gt;hookimpl&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;register_embedding_models&lt;/span&gt;(&lt;span class="pl-s1"&gt;register&lt;/span&gt;):
    &lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-s1"&gt;model_id&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; (
        &lt;span class="pl-s"&gt;"jina-embeddings-v2-small-en"&lt;/span&gt;,
        &lt;span class="pl-s"&gt;"jina-embeddings-v2-base-en"&lt;/span&gt;,
        &lt;span class="pl-s"&gt;"jina-embeddings-v2-large-en"&lt;/span&gt;,
    ):
        &lt;span class="pl-en"&gt;register&lt;/span&gt;(&lt;span class="pl-v"&gt;JinaEmbeddingModel&lt;/span&gt;(&lt;span class="pl-s1"&gt;model_id&lt;/span&gt;))


&lt;span class="pl-k"&gt;class&lt;/span&gt; &lt;span class="pl-v"&gt;JinaEmbeddingModel&lt;/span&gt;(&lt;span class="pl-s1"&gt;llm&lt;/span&gt;.&lt;span class="pl-v"&gt;EmbeddingModel&lt;/span&gt;):
    &lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;__init__&lt;/span&gt;(&lt;span class="pl-s1"&gt;self&lt;/span&gt;, &lt;span class="pl-s1"&gt;model_id&lt;/span&gt;):
        &lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-s1"&gt;model_id&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;model_id&lt;/span&gt;
        &lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-s1"&gt;_model&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-c1"&gt;None&lt;/span&gt;

    &lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;embed_batch&lt;/span&gt;(&lt;span class="pl-s1"&gt;self&lt;/span&gt;, &lt;span class="pl-s1"&gt;texts&lt;/span&gt;):
        &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-s1"&gt;_model&lt;/span&gt; &lt;span class="pl-c1"&gt;is&lt;/span&gt; &lt;span class="pl-c1"&gt;None&lt;/span&gt;:
            &lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-s1"&gt;_model&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-v"&gt;AutoModel&lt;/span&gt;.&lt;span class="pl-en"&gt;from_pretrained&lt;/span&gt;(
                &lt;span class="pl-s"&gt;"jinaai/{}"&lt;/span&gt;.&lt;span class="pl-en"&gt;format&lt;/span&gt;(&lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-s1"&gt;model_id&lt;/span&gt;), &lt;span class="pl-s1"&gt;trust_remote_code&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;True&lt;/span&gt;
            )
        &lt;span class="pl-s1"&gt;results&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-s1"&gt;_model&lt;/span&gt;.&lt;span class="pl-en"&gt;encode&lt;/span&gt;([&lt;span class="pl-s1"&gt;text&lt;/span&gt;[:&lt;span class="pl-v"&gt;MAX_LENGTH&lt;/span&gt;] &lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-s1"&gt;text&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;texts&lt;/span&gt;])
        &lt;span class="pl-k"&gt;return&lt;/span&gt; (&lt;span class="pl-en"&gt;list&lt;/span&gt;(&lt;span class="pl-en"&gt;map&lt;/span&gt;(&lt;span class="pl-s1"&gt;float&lt;/span&gt;, &lt;span class="pl-s1"&gt;result&lt;/span&gt;)) &lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-s1"&gt;result&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;results&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;I'm really pleased with how quickly this came together - I think it's a strong signal that the &lt;a href="https://simonwillison.net/2023/Sep/4/llm-embeddings/"&gt;LLM embeddings plugin design&lt;/a&gt; is working well.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cli"&gt;cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/embeddings"&gt;embeddings&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jina"&gt;jina&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="cli"/><category term="plugins"/><category term="projects"/><category term="ai"/><category term="embeddings"/><category term="llm"/><category term="jina"/></entry></feed>