Simon Willison's Weblog: replicate

Announcing FLUX1.1 [pro] and the BFL API

2024-10-03T19:14:56+00:00

Announcing FLUX1.1 [pro] and the BFL API

FLUX is the image generation model family from Black Forest Labs, a startup founded by members of the team that previously created Stable Diffusion.

Released today, FLUX1.1 [pro] continues the general trend of AI models getting both better and more efficient:

FLUX1.1 [pro] provides six times faster generation than its predecessor FLUX.1 [pro] while also improving image quality, prompt adherence, and diversity.

Black Forest Labs appear to have settled on a potentially workable business model: their smallest, fastest model FLUX.1 [schnell] is Apache 2 licensed. The next step up is FLUX.1 [dev] which is open weights for non-commercial use only. The [pro] models are closed weights, made available exclusively through their API or partnerships with other API providers.

I tried the new 1.1 model out using black-forest-labs/flux-1.1-pro on Replicate just now. Here's my prompt:

Photograph of a Faberge egg representing the California coast. It should be decorated with ornate pelicans and sea lions and a humpback whale.

The FLUX models have a reputation for being really good at following complex prompts. In this case I wanted the sea lions to appear in the egg design rather than looking at the egg from the beach, but I imagine I could get better results if I continued to iterate on my prompt.

The FLUX models are also better at applying text than any other image models I've tried myself.

Via Hacker News

Tags: ai, stable-diffusion, generative-ai, replicate, text-to-image

Accessing Llama 2 from the command-line with the llm-replicate plugin

2023-07-18T19:30:59+00:00

The big news today is Llama 2, the new openly licensed Large Language Model from Meta AI. It's a really big deal:

Llama 2 is available for free for research and commercial use.

(swyy spotted that it was rebranded from LLaMA 2 to Llama 2 a few hours ago.)

Facebook's original LLaMA model, released in February, kicked off a seismic wave of innovation in the world of open source LLMs - from fine-tuned variants to from-scratch recreations. I wrote about the impact this was having a few weeks after its original release in Large language models are having their Stable Diffusion moment. If anything I under-estimated how significant this was going to be.

There was just one catch: you couldn't use it for commercial purposes.

That's changed with Llama 2!

It does still have a couple of interesting limitations, emphasis mine:

You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).

If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion[...]

This still makes it available for most commercial purposes. Let's try it out.

Accessing Llama 2 using LLM

My LLM tool provides command-line access to a wide variety of language models, both via web APIs and self-hosted on your own machine.

Last week I added plugin support for extending it to support new models. An independently released plugin can now add support for new models to the existing CLI tool.

The brand new llm-replicate plugin provides CLI access to models hosted on Replicate, and this morning a16z-infra released a16z-infra/llama13b-v2-chat which provides Replicate API access to the new Llama 2 13B chat model.

You'll need a Replicate API key.

Install LLM like this:

pip install llm

(I like to use pipx.)

Or if you don't have pip setup but can use Homebrew:

brew install simonw/llm/llm

Run llm --version to confirm it's installed.

Next, install the llm-replicate plugin:

llm install llm-replicate

The plugin adds a new llm replicate set of commands - see them with llm --help.

Configure your Replicate API key so the plugin can talk to the API:

llm keys set replicate

Enter key: <paste key here>

llm replicate add a16z-infra/llama13b-v2-chat \
  --chat --alias llama2

This tells the plugin that it's a "chat" model, which means you can have continuing conversations with it, rather than just sending single prompts.

Now we can send a prompt, using the llama2 alias we configured there:

llm -m llama2 "Ten great names for a pet pelican"

I get this:

Certainly! Here are ten great names for a pet pelican:

Peanut: A cute and playful name for a small pelican.

Puddles: A fun and silly name for a pelican that loves to splash around in water.

Pete: A classic name that suits a friendly and outgoing pelican.

Penelope: A pretty and elegant name for a female pelican.

Percy: A charming and sophisticated name for a male pelican.

Pippy: A playful and affectionate name for a pelican with a lively personality.

Pelty: A fun and quirky name for a pelican with a unique sense of style.

Pip: A sweet and endearing name for a small pelican.

Plucky: A feisty and adventurous name for a pelican with a big personality.

Poppy: A lovely and delicate name for a female pelican with a sweet disposition.

I hope these suggestions help you find the perfect name for your pet pelican!

And because it's a chat model, we can send a follow-up prompt to the same conversation:

llm -c "Five more and make them more nautical"

Ahoy! Here be five more nautical names for yer pet pelican:

Seabird: A fitting name for a pelican that loves the ocean and all its wonders.

Captain Hook: A swashbuckling name for a pelican with a bold and fearless spirit.

Anchor: A sturdy and dependable name for a pelican that's always grounded and stable.

Finley: A fishy name for a pelican with a love for the sea and all its creatures.

Buoyant: A bouncy and cheerful name for a pelican with a lighthearted and carefree attitude.

Hope these suggestions float yer boat, matey!

LLM logs every prompt and response sent through it to a SQLite database. I'm using --truncate here to shorten the output to include in this post:

llm logs --truncate

[
  {
    "id": "01h5n5b3q6s54e03wtvqe90w2g",
    "model": "replicate-a16z-infra-llama13b-v2-chat",
    "prompt": "Five more and make them more nautical",
    "system": null,
    "response": "Ahoy matey! Here be five more nautical-themed names for yer pet pelican:\n\n1. Captain Hook\n2. Anch...",
    "response_json": null,
    "conversation_id": "01h5n5atsqfg4bhxc53j3ymnx2",
    "duration_ms": 7364,
    "datetime_utc": "2023-07-18T18:48:51.999944",
    "conversation_name": "Ten great names for a pet pelic\u2026",
    "conversation_model": "replicate-a16z-infra-llama13b-v2-chat"
  },
  {
    "id": "01h5n5atss6k9zsr5qs5hb2wzd",
    "model": "replicate-a16z-infra-llama13b-v2-chat",
    "prompt": "Ten great names for a pet pelican",
    "system": null,
    "response": "Sure, here are ten great names for a pet pelican:\n\n1. Pelty\n2. Peanut\n3. Puddles\n4. Nibbles\n5. Fu...",
    "response_json": null,
    "conversation_id": "01h5n5atsqfg4bhxc53j3ymnx2",
    "duration_ms": 7873,
    "datetime_utc": "2023-07-18T18:48:42.356413",
    "conversation_name": "Ten great names for a pet pelic\u2026",
    "conversation_model": "replicate-a16z-infra-llama13b-v2-chat"
  }
]

Or run llm logs path to see the path to the SQLite database file. If you have Datasette installed you can open the logs database like this:

datasette "$(llm logs path)"

Or with Datasette Desktop:

open "$(llm logs path)"

What's next

I'm really excited about Llama 2. I expect we'll start seeing all kinds of innovation on top of it as the model weights themselves start to roll out.

I'm particularly looking forward to:

The fine-tuned models. Alpaca, Vicuna et-al did amazing things with fine-tuning on top of the original LLaMA. I expect we'll see dozens of fine-tuned models over the next few weeks.
Implementation of the reAct and Toolformer patterns, which will enable Llama 2 to drive external tools in a similar way to OpenAI Functions. The Llama 2 paper mentions testing against Toolformer prompts already, so this hopefully won't be too difficult.

Hopefully this time we'll start to see a concerted effort from people to document what kind of prompts work well with Llama 2 for practical purposes, since I expect this model will be an influential piece of the ecosystem for some time to come.

Tags: cli, plugins, projects, ai, generative-ai, llama, local-llms, llms, replicate, llm, llm-release

How I Used Stable Diffusion and Dreambooth to Create A Painted Portrait of My Dog

2023-04-16T19:57:35+00:00

How I Used Stable Diffusion and Dreambooth to Create A Painted Portrait of My Dog

I like posts like this that go into detail in terms of how much work it takes to deliberately get the kind of result you really want using generative AI tools. Jake Dahn trained a Dreambooth model from 40 photos of Queso—his photogenic Golden Retriever—using Replicate, then gathered the prompts from ten images he liked on Lexica and generated over 1,000 different candidate images, picked his favourite, used Draw Things img2img resizing to expand the image beyond the initial crop, then Automatic1111 inpainting to tweak the ears, then Real-ESRGAN 4x+ to upscale for the final print.

Via Hacker News

Tags: ai, stable-diffusion, generative-ai, replicate, text-to-image

Fine-tune LLaMA to speak like Homer Simpson

2023-03-17T23:08:40+00:00

Fine-tune LLaMA to speak like Homer Simpson

Replicate spent 90 minutes fine-tuning LLaMA on 60,000 lines of dialog from the first 12 seasons of the Simpsons, and now it can do a good job of producing invented dialog from any of the characters from the series. This is a really interesting result: I’ve been skeptical about how much value can be had from fine-tuning large models on just a tiny amount of new data, assuming that the new data would be statistically irrelevant compared to the existing model. Clearly my mental model around this was incorrect.

Tags: the-simpsons, ai, generative-ai, llama, local-llms, llms, replicate, fine-tuning

Train and run Stanford Alpaca on your own machine

2023-03-16T16:10:39+00:00

Train and run Stanford Alpaca on your own machine

The team at Replicate managed to train their own copy of Stanford’s Alpaca—a fine-tuned version of LLaMA that can follow instructions like ChatGPT. Here they provide step-by-step instructions for recreating Alpaca yourself—running the training needs one or more A100s for a few hours, which you can rent through various cloud providers.

Tags: stanford, ai, generative-ai, llama, local-llms, llms, replicate, fine-tuning

A tool to run caption extraction against online videos using Whisper and GitHub Issues/Actions

2022-09-30T00:56:28+00:00

I released a new project this weekend, built during the Bellingcat Hackathon (I came second!) It's called Action Transcription and it's a tool for caturing captions and transcripts from online videos.

Here's my video introducing the new tool:

Bellingcat

Bellingcat describe themselves as an "independent international collective of researchers, investigators and citizen journalists using open source and social media investigation to probe a variety of subjects".

They specialize in open source intelligence - which, confusingly, does NOT mean "open source software" - this is a much older usage of the term that describes the use of publicly available information to gather intelligence.

They have broken a LOT of impressive stories over their eight year lifespan. Wikipedia has a good list - highlights include identifying the suspects behind the Skripal poisoning case.

The theme of the hackathon was "General Digital Investigation Tools". The goal was to build prototypes of tools that could be used by their community of investigators - most of whom are volunteers working from home with little-to-no budget, and often with limited technical skills (they can use tools very effectively but they might not be comfortable writing code or using the command-line).

Inspired by the recent release of OpenAI's Whisper, I decided to build a tool that would make it easier to extract captions and transcripts from videos on social media sites.

Why GitHub Actions and GitHub Issues?

My goals for the project were:

Help people achieve something useful
Make it as inexpensive to run as possible - ideally free
Make it easy for people to install and run their own copies

I decided to build the entire thing using GitHub Actions and GitHub Issues.

GitHub Actions is a powerful service for running CI jobs and other automation, but its best feature for this particular project is that it's free.

I'm fine with spending money myself, but if I'm building tools for other people having a way for them to run the tool without paying for anything is a huge win.

My tool needed a UI. To keep things as simple as possible, i didn't want to host anything outside of GitHub itself. So I turned to GitHub Issues to provide the interface layer.

It's easy to create Actions scripts that trigger when a new issue is created. And those scripts can then interact with that issue - attaching comments, or even closing it as completed.

I decided that my flow would be:

The user opens an issue and pastes in a link to an online video.
GitHub Actions is triggered by that issue, extracts the URL and fetches the video using youtube-dl (which, despite the name, can actually download videos from over 1,200 sites including many of the social media services popular in Russia).
The script extracts just the audio from the video.
The audio is then passed through OpenAI's Whisper, which can create a high quality transcript in the original language AND create a shockingly good English translation.
The caption is then both written back to the GitHub repository and attached to the original issue as a comment.

GitHub Actions doesn't (yet) provide GPUs, and Whisper works a whole lot faster with GPU access. So I decided to run Whisper using this hosted copy of the model on Replicate.

Extracting YouTube's captions directly

I had a check-in meeting with Tristan from Bellingcat just to make sure my hack wasn't a duplicate effort, and to get feedback on the plan.

Tristan liked the plan, but pointed out that extracting captions directly from YouTube would be a useful additional feature.

In addition to supporting manual captions, it turns out YouTube already creates machine-generated captions in over 100 languages! The quality of these isn't nearly as good as OpenAI Whisper, but they're still useful. And they're free (running Whisper currently costs me money).

So I adapted the plan, to provide the user with two options. The default option would extract captions directly from the video provider - which would definitely work for YouTube and might work for other sites too.

The second option would use Whisper to create a transcript and a translation, taking longer but providing results even for sites that didn't offer their own captions.

I decided to use issue tags to trigger these two workflows: tag with "captions" to extract captions directly, tag with "whisper" to use Whisper.

The implementation

The implementation ended up being 218 lines of JavaScript-embedded-in-YAML in a GitHub Actions issue_created.yml workflow.

I used actions/github-script for it - a convenient reusable Action that provides a pre-configured set of JavaScript objects for interacting with the GitHub API.

The code isn't hugely elegant: I'm not hugely familiar with the Node.js ecosystem so I ended up hacking around with Copilot quite a bit to figure out the patterns that would work.

It turns out captions can come back in a variety of different formats. The two most common appeared to be TTML - which uses XML, and WebVTT, a text-based format.

I decided to archive the original caption files in the GitHub repository itself, but I wanted to extract just the text and post that as the issue comment.

So I ended up building two tiny new tools: webvtt-to-json and ttml-to-json - which converted the different formats into a standard JSON format of my own invention, normalizing the captions so I could then extract the text and include it in a comment.

Hackathons tend to encourage some pretty scrappy solutions!

The results

These two issues demonstrate the final result of the tool:

That first one in particular shows quite how good the Whisper model is at handling Russian text, and translating it to English.

Adding issue templates

I added one last enhancement to the project after recording the demo video for the judges embedded above.

Issue templates are a new GitHub feature that let you define a form that users must fill out when they create a new issue.

Frustratingly, these only work with public repositories. I had built my hack in a private repo at first, so I was only able to explore using issue templates once I had made it public.

I created two issue templates - one for caption tasks and one for whisper tasks.

Now when a user goes to open a new issue they get to chose one of the two templates and fill in the URL as part of a form! Here's a GIF demo showing that flow in action:

Template repositories

One last trick. I want users to be able to run this system themselves, on their own GitHub account.

I made simonw/action-transcription a template repository.

This means that any user can click a green button to get their own copy of the repository - and when they do, they'll get their own fully configured copy of the GitHub Actions workflows too.

If they want to use Whisper they'll need to get an API key from Replicate.com and add it to their repository's secrets - but regular caption extraction will work fine without that.

I've used this technique before - I wrote about it here:

GitHub Actions as a platform

I'm pleased with how this project turned out. But I'm mainly excited about the underlying pattern. I think building tools using GitHub Actions that people can clone to their own accounts is a really promising way of developing sophisticated automated software that people can then run independently, entirely through the GitHub web interface.

I'm excited to see more tools adopt a similar pattern.

Tags: projects, hackathons, bellingcat, github-actions, openai, whisper, replicate, github-issues, speech-to-text