Simon Willison's Weblog: facebook

Meta's new model is Muse Spark, and meta.ai chat has some interesting tools

2026-04-08T23:07:44+00:00

Meta announced Muse Spark today, their first model release since Llama 4 almost exactly a year ago. It's hosted, not open weights, and the API is currently "a private API preview to select users", but you can try it out today on meta.ai (Facebook or Instagram login required).

Meta's self-reported benchmarks show it competitive with Opus 4.6, Gemini 3.1 Pro, and GPT 5.4 on selected benchmarks, though notably behind on Terminal-Bench 2.0. Meta themselves say they "continue to invest in areas with current performance gaps, such as long-horizon agentic systems and coding workflows".

The model is exposed as two different modes on meta.ai - "Instant" and "Thinking". Meta promise a "Contemplating" mode in the future which they say will offer much longer reasoning time and should behave more like Gemini Deep Think or GPT-5.4 Pro.

A couple of pelicans

I prefer to run my pelican test via API to avoid being influenced by any invisible system prompts, but since that's not an option I ran it against the chat UI directly.

Here's the pelican I got for "Instant":

And this one for "Thinking":

Both SVGs were rendered inline by the Meta AI interface. Interestingly, the Instant model output an SVG directly (with code comments) whereas the Thinking model wrapped it in a thin HTML shell with some unused Playables SDK v1.0.0 JavaScript libraries.

Which got me curious...

Poking around with tools

Clearly Meta's chat harness has some tools wired up to it - at the very least it can render SVG and HTML as embedded frames, Claude Artifacts style.

But what else can it do?

I asked it:

what tools do you have access to?

And then:

I want the exact tool names, parameter names and tool descriptions, in the original format

It spat out detailed descriptions of 16 different tools. You can see the full list I got back here - credit to Meta for not telling their bot to hide these, since it's far less frustrating if I can get them out without having to mess around with jailbreaks.

Here are highlights derived from that response:

Browse and search. browser.search can run a web search through an undisclosed search engine, browser.open can load the full page from one of those search results and browser.find can run pattern matches against the returned page content.
Meta content search. meta_1p.content_search can run "Semantic search across Instagram, Threads, and Facebook posts" - but only for posts the user has access to view which were created since 2025-01-01. This tool has some powerful looking parameters, including author_ids, key_celebrities, commented_by_user_ids, and liked_by_user_ids.
"Catalog search" - meta_1p.meta_catalog_search can "Search for products in Meta's product catalog", presumably for the "Shopping" option in the Meta AI model selector.
Image generation. media.image_gen generates images from prompts, and "returns a CDN URL and saves the image to the sandbox". It has modes "artistic" and "realistic" and can return "square", "vertical" or "landscape" images.
container.python_execution - yes! It's Code Interpreter, my favourite feature of both ChatGPT and Claude.

Execute Python code in a remote sandbox environment. Python 3.9 with pandas, numpy, matplotlib, plotly, scikit-learn, PyMuPDF, Pillow, OpenCV, etc. Files persist at /mnt/data/.

Python 3.9 is EOL these days but the library collection looks useful.

I prompted "use python code to confirm sqlite version and python version" and got back Python 3.9.25 and SQLite 3.34.1 (from January 2021).
container.create_web_artifact - we saw this earlier with the HTML wrapper around the pelican: Meta AI can create HTML+JavaScript files in its container which can then be served up as secure sandboxed iframe interactives. "Set kind to html for websites/apps or svg for vector graphics."
container.download_meta_1p_media is interesting: "Download media from Meta 1P sources into the sandbox. Use post_id for Instagram/Facebook/Threads posts, or catalog_search_citation_id for catalog product images". So it looks like you can pull in content from other parts of Meta and then do fun Code Interpreter things to it in the sandbox.
container.file_search - "Search uploaded files in this conversation and return relevant excerpts" - I guess for digging through PDFs and similar?
Tools for editing files in the container - container.view, container.insert (with new_str and insert_line), container.str_replace. These look similar to Claude's text editor tool commands - these are becoming a common pattern across any file-equipped agent harness.
container.visual_grounding - see below, this one is fun.
subagents.spawn_agent - the sub-agent as a tool pattern. "Spawn an independent sub-agent for research, analysis, or delegation. It returns its final text response."
third_party.link_third_party_account - "Initiate account linking for a third-party service", currently Google Calendar, Outlook Calendar, Gmail, or Outlook.

Image analysis in the container

Let's talk about that container.visual_grounding one. Here's the description in full:

Description: Visual grounding tool that analyzes the image, identifies and labels objects, locates regions, or counts objects.

Parameters:

object_names (array, required)

image_path (string, default "")

format_type ("bbox"|"point"|"count"|null, default "bbox")

title (string|null)[string]

So it can take an image path, optional object names and return things in format bbox, point or count.

This sounded to me like Meta's Segment Anything, (I later heard that it's not, this is a native feature of the model itself via a tool call with a custom system prompt). I decided to take it for a spin.

Prompt:

generate a photo of a raccoon sitting on a trash can wearing trash as a hat

Response:

Here you go — king of the alley, in full regalia:

Coffee cup crown, banana peel brim, newspaper feather. Peak raccoon fashion. Want me to make a wilder version — maybe night-vision, or wearing sunglasses too?

My guess is that image generation in Meta AI is powered by their Emu model, or an updated version of it. This is the same pattern as ChatGPT and Gemini image generation, where images are generated by a tool call (here that's media.image_gen).

The image_gen tool description said that it "saves the image to the sandbox", so my guess was that I'd then be able to run Python code against it. I tried:

use python OpenCV to analyze that image and find out neat things about it

Annoyingly it did not show me the code it ran, but it did create this image:

Pretty neat! We can both generate images and then use them with Python tools.

Next, the visual_grounding tool. Knowing the names of the tools makes it easy to refer to them in prompts:

Now try the container.visual_grounding tool

The response (full response here):

Done — visual_grounding pinpointed every piece of the outfit with pixel-level precision:

[...]

Want me to run it in bbox mode next to draw boxes around the fur texture, or count mode to see how many individual whiskers it can detect?

Once again, Meta AI wrapped the results from the tool in a custom HTML visualization. That's clearly point mode, but how about the others?

I prompted:

yes draw bounding boxes, can it do masks too?

Response:

Yes — bounding boxes work great. Masks? Not directly in visual_grounding; it only returns three formats: point, bbox, and count. No pixel-level segmentation masks.

Here's the bbox output:

What you’re seeing:

raccoon: big box from (270,240) to (892,845) — covers 62% of the width, basically the whole animal

coffee cup: (424,115)-(610,273) — sits highest, 158px tall in the 0-1000 space

banana peel: (472,215)-(667,291) — overlaps the cup's base, confirming it's draped over the side

newspaper: (305,223)-(548,315) — left ear coverage

trash can lid: (107,800)-(1000,998) — spans almost the full image width at the bottom

face: (305,283)-(750,537) — tight crop around the mask

eyes: both ~40×37 pixel boxes, perfectly level at y=379

Notice how the boxes nest: face is inside raccoon, eyes inside face, hat pieces overlap each other but stay above the face box. The tool is doing real object localization, not just guessing centers.

About masks: since visual_grounding doesn't output them, I can fake them with OpenCV in the container. Two quick options:

GrabCut — I give it the raccoon bbox, it iteratively segments fur from background and returns a binary mask

SAM-style color threshold — use the bbox plus k-means to isolate the coffee cup or banana peel

Want me to generate an actual alpha mask for the raccoon (cut it out from the alley), or for the trash hat pieces separately? Takes about 10 seconds and I can give you a transparent PNG.

Here's the HTML for that bbox visualization.

Finally I had it dump out the raw JSON for point mode, bbox mode and count mode. You can see that all here. The count mode is particularly interesting - here's that in full:

[
  {
    "object_name": "raccoon whisker",
    "points": [
      {"x": 403, "y": 445},
      {"x": 397, "y": 458},
      {"x": 392, "y": 472},
      {"x": 391, "y": 485},
      {"x": 398, "y": 497},
      {"x": 415, "y": 512},
      {"x": 618, "y": 446},
      {"x": 622, "y": 461},
      {"x": 625, "y": 475},
      {"x": 628, "y": 488},
      {"x": 635, "y": 501},
      {"x": 651, "y": 518}
    ],
    "count": 12
  },
  {
    "object_name": "raccoon paw claw",
    "points": [
      {"x": 406, "y": 845},
      {"x": 424, "y": 847},
      {"x": 441, "y": 842},
      {"x": 473, "y": 833},
      {"x": 596, "y": 830},
      {"x": 635, "y": 844},
      {"x": 660, "y": 843},
      {"x": 679, "y": 838}
    ],
    "count": 8
  },
  {
    "object_name": "trash item on head",
    "points": [
      {"x": 514, "y": 191},
      {"x": 568, "y": 263},
      {"x": 392, "y": 274}
    ],
    "count": 3
  },
  {
    "object_name": "eye",
    "points": [
      {"x": 462, "y": 395},
      {"x": 582, "y": 398}
    ],
    "count": 2
  },
  {
    "object_name": "ear",
    "points": [
      {"x": 359, "y": 313},
      {"x": 680, "y": 294}
    ],
    "count": 2
  }
]

So Meta AI has the ability to count a raccoon's whiskers baked into the default set of tools.

Which means... it can count pelicans too!

Here's that overlay exported as HTML.

Update: Meta's Jack Wu confirms that these tools are part of the new harness they launched alongside the new model.

Maybe open weights in the future?

On Twitter Alexandr Wang said:

this is step one. bigger models are already in development with infrastructure scaling to match. private api preview open to select partners today, with plans to open-source future versions.

I really hope they do go back to open-sourcing their models. Llama 3.1/3.2/3.3 were excellent laptop-scale model families, and the introductory blog post for Muse Spark had this to say about efficiency:

[...] we can reach the same capabilities with over an order of magnitude less compute than our previous model, Llama 4 Maverick. This improvement also makes Muse Spark significantly more efficient than the leading base models available for comparison.

So are Meta back in the frontier model game? Artificial Analysis think so - they scored Meta Spark at 52, "behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6". Last year's Llama 4 Maverick and Scout scored 18 and 13 respectively.

I'm waiting for API access - while the tool collection on meta.ai is quite strong the real test of a model like this is still what we can build on top of it.

Tags: facebook, ai, generative-ai, llms, code-interpreter, llm-tool-use, meta, pelican-riding-a-bicycle, llm-reasoning, llm-release

Quoting Mark Zuckerberg

2025-01-30T13:41:49+00:00

Llama 4 is making great progress in training. Llama 4 mini is done with pre-training and our reasoning models and larger model are looking good too. Our goal with Llama 3 was to make open source competitive with closed models, and our goal for Llama 4 is to lead. Llama 4 will be natively multimodal -- it's an omni-model -- and it will have agentic capabilities, so it's going to be novel and it's going to unlock a lot of new use cases.

— Mark Zuckerberg, on Meta's quarterly earnings report

Tags: vision-llms, llama, ai, llms, meta, generative-ai, facebook, mark-zuckerberg, multi-modal-output, llm-reasoning

In Leak, Facebook Partner Brags About Listening to Your Phone’s Microphone to Serve Ads for Stuff You Mention

2024-09-02T23:56:44+00:00

In Leak, Facebook Partner Brags About Listening to Your Phone’s Microphone to Serve Ads for Stuff You Mention

(I've repurposed some of my comments on Lobsters into this commentary on this article. See also I still don’t think companies serve you ads based on spying through your microphone.)

Which is more likely?

All of the conspiracy theories are real! The industry managed to keep the evidence from us for decades, but finally a marketing agency of a local newspaper chain has blown the lid off the whole thing, in a bunch of blog posts and PDFs and on a podcast.
Everyone believed that their phone was listening to them even when it wasn’t. The marketing agency of a local newspaper chain were the first group to be caught taking advantage of that widespread paranoia and use it to try and dupe people into spending money with them, despite the tech not actually working like that.

My money continues to be on number 2.

Here’s their pitch deck. My “this is a scam” sense is vibrating like crazy reading it: CMG Pitch Deck on Voice-Data Advertising 'Active Listening'.

It does not read to me like the deck of a company that has actually shipped their own app that tracks audio and uses it for even the most basic version of ad targeting.

They give the game away on the last two slides:

Prep work:

Create buyer personas by uploading past consumer data into the platform

Identify top performing keywords relative to your products and services by analyzing keyword data and past ad campaigns

Ensure tracking is set up via a tracking pixel placed on your site or landing page

Now that preparation is done:

Active listening begins in your target geo and buyer behavior is detected across 470+ data sources […]

Our technology analyzes over 1.9 trillion behaviors daily and collects opt-in customer behavior data from hundreds of popular websites that offer top display, video platforms, social applications, and mobile marketplaces that allow laser-focused media buying.

Sources include: Google, LinkedIn, Facebook, Amazon and many more

That’s not describing anything ground-breaking or different. That’s how every targeting ad platform works: you upload a bunch of “past consumer data”, identify top keywords and setup a tracking pixel.

I think active listening is the term that the team came up with for “something that sounds fancy but really just means the way ad targeting platforms work already”. Then they got over-excited about the new metaphor and added that first couple of slides that talk about “voice data”, without really understanding how the tech works or what kind of a shitstorm that could kick off when people who DID understand technology started paying attention to their marketing.

TechDirt's story Cox Media Group Brags It Spies On Users With Device Microphones To Sell Targeted Ads, But It’s Not Clear They Actually Can included a quote with a clarification from Cox Media Group:

CMG businesses do not listen to any conversations or have access to anything beyond a third-party aggregated, anonymized and fully encrypted data set that can be used for ad placement. We regret any confusion and we are committed to ensuring our marketing is clear and transparent.

Why I don't buy the argument that it's OK for people to believe this

I've seen variants of this argument before: phones do creepy things to target ads, but it’s not exactly “listen through your microphone” - but there’s no harm in people believing that if it helps them understand that there’s creepy stuff going on generally.

I don’t buy that. Privacy is important. People who are sufficiently engaged need to be able to understand exactly what’s going on, so they can e.g. campaign for legislators to reign in the most egregious abuses.

I think it’s harmful letting people continue to believe things about privacy that are not true, when we should instead be helping them understand the things that are true.

This discussion thread is full of technically minded, engaged people who still believe an inaccurate version of what their devices are doing. Those are the people that need to have an accurate understanding, because those are the people that can help explain it to others and can hopefully drive meaningful change.

This is such a damaging conspiracy theory.

It’s causing some people to stop trusting their most important piece of personal technology: their phone.
We risk people ignoring REAL threats because they’ve already decided to tolerate made up ones.
If people believe this and see society doing nothing about it, that’s horrible. That leads to a cynical “nothing can be fixed, I guess we will just let bad people get away with it” attitude. People need to believe that humanity can prevent this kind of abuse from happening.

The fact that nobody has successfully produced an experiment showing that this is happening is one of the main reasons I don’t believe it to be happening.

It’s like James Randi’s One Million Dollar Paranormal Challenge - the very fact that nobody has been able to demonstrate it is enough for me not to believe in it.

Tags: conspiracy, facebook, privacy, microphone-ads-conspiracy

Where Facebook's AI Slop Comes From

2024-08-10T00:26:22+00:00

Where Facebook's AI Slop Comes From

Jason Koebler continues to provide the most insightful coverage of Facebook's weird ongoing problem with AI slop (previously).

Who's creating this stuff? It looks to primarily come from individuals in countries like India and the Philippines, inspired by get-rich-quick YouTube influencers, who are gaming Facebook's Creator Bonus Program and flooding the platform with AI-generated images.

Jason highlights this YouTube video by YT Gyan Abhishek (136,000 subscribers) and describes it like this:

He pauses on another image of a man being eaten by bugs. “They are getting so many likes,” he says. “They got 700 likes within 2-4 hours. They must have earned $100 from just this one photo. Facebook now pays you $100 for 1,000 likes … you must be wondering where you can get these images from. Don’t worry. I’ll show you how to create images with the help of AI.”

That video is in Hindi but you can request auto-translated English subtitles in the YouTube video settings. The image generator demonstrated in the video is Ideogram, which offers a free plan. (Here's pelicans having a tea party on a yacht.)

Jason's reporting here runs deep - he goes as far as buying FewFeed, dedicated software for scraping and automating Facebook, and running his own (unsuccessful) page using prompts from YouTube tutorials like:

an elderly woman celebrating her 104th birthday with birthday cake realistic family realistic jesus celebrating with her

I signed up for a $10/month 404 Media subscription to read this and it was absolutely worth the money.

Tags: ethics, facebook, ai, slop, jason-koebler, meta, ai-ethics, ai-misuse

Quoting Mark Zuckerberg

2024-07-23T16:52:07+00:00

I believe the Llama 3.1 release will be an inflection point in the industry where most developers begin to primarily use open source, and I expect that approach to only grow from here.

— Mark Zuckerberg

Tags: meta, open-source, generative-ai, facebook, mark-zuckerberg, ai, llms, llama

Introducing Llama 3.1: Our most capable models to date

2024-07-23T15:40:47+00:00

Introducing Llama 3.1: Our most capable models to date

We've been waiting for the largest release of the Llama 3 model for a few months, and now we're getting a whole new model family instead.

Meta are calling Llama 3.1 405B "the first frontier-level open source AI model" and it really is benchmarking in that GPT-4+ class, competitive with both GPT-4o and Claude 3.5 Sonnet.

I'm equally excited by the new 8B and 70B 3.1 models - both of which now support a 128,000 token context and benchmark significantly higher than their Llama 3 equivalents. Same-sized models getting more powerful and capable a very reassuring trend. I expect the 8B model (or variants of it) to run comfortably on an array of consumer hardware, and I've run a 70B model on a 64GB M2 in the past.

The 405B model can at least be run on a single server-class node:

To support large-scale production inference for a model at the scale of the 405B, we quantized our models from 16-bit (BF16) to 8-bit (FP8) numerics, effectively lowering the compute requirements needed and allowing the model to run within a single server node.

Meta also made a significant change to the license:

We’ve also updated our license to allow developers to use the outputs from Llama models — including 405B — to improve other models for the first time.

We’re excited about how this will enable new advancements in the field through synthetic data generation and model distillation workflows, capabilities that have never been achieved at this scale in open source.

I'm really pleased to see this. Using models to help improve other models has been a crucial technique in LLM research for over a year now, especially for fine-tuned community models release on Hugging Face. Researchers have mostly been ignoring this restriction, so it's reassuring to see the uncertainty around that finally cleared up.

Lots more details about the new models in the paper The Llama 3 Herd of Models including this somewhat opaque note about the 15 trillion token training data:

Our final data mix contains roughly 50% of tokens corresponding to general knowledge, 25% of mathematical and reasoning tokens, 17% code tokens, and 8% multilingual tokens.

Update: I got the Llama 3.1 8B Instruct model working with my LLM tool via a new plugin, llm-gguf.

Tags: facebook, ai, llama, llms, meta, llm-release

Facebook Is the 'Zombie Internet'

2024-07-15T18:56:54+00:00

Facebook Is the 'Zombie Internet'

Ever since Facebook started to become infested with weird AI-generated images of shrimp Jesus - with thousands of comments and likes - I've been wondering how much of that activity is real humans as opposed to yet more bots.

Jason Koebler has been on the Facebook AI slop beat for a while. In this superb piece of online investigative reporting he dives deep into an attempt to answer that question, using multiple Facebook burner accounts and contacting more than 300 users who have commented on that kind of image.

I endlessly tried to talk to people who commented on these images, but I had no luck at all. Over the course of several months, I messaged 300 people who commented on bizarre AI-generated images, which I could only do 20 or so at a time before Facebook stopped letting me send messages for several hours. I also commented on dozens of images myself, asking for any human who had also commented on the image to respond to me. Across those hundreds of messages, I got four total responses.

Jacob also talked to Khan Schoolcraft, a moderator of the Um, isn’t that AI? group, who said:

In my experience, the supermajority of engagement on viral AI Facebook pages is just as artificially-generated as the content they publish. When exploring their comment sections, one will often see hundreds of bot-like comments interspersed with a few ‘real’ people sounding the alarm to no avail. [...]

Whether it's a child transforming into a water bottle cyborg, a three-armed flight attendant rescuing Tiger Jesus from a muddy plane crash, or a hybrid human-monkey baby being stung to death by giant hornets, all tend to have copy+pasted captions, reactions & comments which usually make no sense in the observed context.

Tags: facebook, ai, generative-ai, slop, jason-koebler

AI chatbots are intruding into online communities where people are trying to connect with other humans

2024-06-09T03:14:26+00:00

AI chatbots are intruding into online communities where people are trying to connect with other humans

This thing where Facebook are experimenting with AI bots that reply in a group when someone "asks a question in a post and no one responds within an hour" is absolute grade A slop - unwanted, unreviewed AI generated text that makes the internet a worse place.

The example where Meta AI replied in an education forum saying "I have a child who is also 2e and has been part of the NYC G&T program" is inexcusable.

Via @dangillmor

Tags: ethics, facebook, ai, generative-ai, llms, slop, ai-ethics, ai-misuse

Man caught in scam after AI told him fake Facebook customer support number was legitimate

2024-05-31T16:53:33+00:00

Man caught in scam after AI told him fake Facebook customer support number was legitimate

This one illustrates a nasty edge-case if you ship any kind of chatbot with your company's name attached to it.

The scam victim here searched Google for a customer support phone number for Facebook, and came across a suspicious looking phone number.

He pasted that number into the Meta AI chatbot in Facebook Messenger and asked "is it a Facebook help line?" - and a Meta AI answered:

The phone number 1-xxx-xxx-xxxx is indeed a legitimate Facebook support number. Meta, the company that owns Facebook, lists this number as a contact for Meta Support, which includes support for Facebook, Instagram, and more.

This was a total hallucination, and the phone number was for a scammer who ran a classic "download this app so I can help you" attack.

It doesn't matter how many disclaimers you add to a chatbot: this kind of misunderstanding from users is inevitable.

Via Hacker News

Tags: ethics, facebook, scams, ai, llms, ai-ethics, hallucinations

Quoting Meta AI bot

2024-04-18T03:34:15+00:00

I have a child who is also 2e and has been part of the NYC G&T program. We've had a positive experience with the citywide program, specifically with the program at The Anderson School.

— Meta AI bot, answering a question on a forum

Tags: ai-misuse, ethics, generative-ai, facebook, ai, llms, meta, ai-ethics

Threads has entered the fediverse

2024-03-22T20:15:20+00:00

Threads has entered the fediverse

Threads users with public profiles in certain countries can now turn on a setting which makes their posts available in the fediverse—so users of ActivityPub systems such as Mastodon can follow their accounts to subscribe to their posts.

It’s only a partial integration at the moment: Threads users can’t themselves follow accounts from other providers yet, and their notifications will show them likes but not boosts or replies: “For now, people who want to see replies on their posts on other fediverse servers will have to visit those servers directly.”

Depending on how you count, Mastodon has around 9m user accounts of which 1m are active. Threads claims more than 130m active monthly users. The Threads team are developing these features cautiously which is reassuring to see—a clumsy or thoughtless integration could cause all sorts of damage just from the sheer scale of their service.

Via Gavin Anderegg

Tags: facebook, threads, mastodon, activitypub, fediverse

Quoting Kieran McCarthy

2024-02-28T15:15:13+00:00

For the last few years, Meta has had a team of attorneys dedicated to policing unauthorized forms of scraping and data collection on Meta platforms. The decision not to further pursue these claims seems as close to waving the white flag as you can get against these kinds of companies. But why? [...]

In short, I think Meta cares more about access to large volumes of data and AI than it does about outsiders scraping their public data now. My hunch is that they know that any success in anti-scraping cases can be thrown back at them in their own attempts to build AI training databases and LLMs. And they care more about the latter than the former.

— Kieran McCarthy

Tags: facebook, scraping, ai, llms, training-data

All you need is Wide Events, not “Metrics, Logs and Traces”

2024-02-27T22:57:14+00:00

All you need is Wide Events, not “Metrics, Logs and Traces”

I’ve heard great things about Meta’s internal observability platform Scuba, here’s an explanation from ex-Meta engineer Ivan Burmistrov describing the value it provides and comparing it to the widely used OpenTelemetry stack.

Via Hacker News

Tags: facebook, observability

Facebook Is Being Overrun With Stolen, AI-Generated Images That People Think Are Real

2023-12-19T02:01:35+00:00

Facebook Is Being Overrun With Stolen, AI-Generated Images That People Think Are Real

Excellent investigative piece by Jason Koebler digging into the concerning trend of Facebook engagement farming accounts who take popular aspirational images and use generative AI to recreate hundreds of variants of them, which then gather hundreds of comments from people who have no idea that the images are fake.

Tags: ethics, facebook, ai, generative-ai, jason-koebler, ai-ethics, ai-misuse

Meta/Threads Interoperating in the Fediverse Data Dialogue Meeting yesterday

2023-12-12T01:05:32+00:00

Meta/Threads Interoperating in the Fediverse Data Dialogue Meeting yesterday

Johannes Ernst reports from a recent meeting hosted by Meta aimed at bringing together staff from Meta’s Threads social media platform with representatives from the Fediverse.

Meta have previously announced an intention for Threads to join the Fediverse. It sounds like they’re being extremely thoughtful about how to go about this.

Two points that stood out for me:

“Rolling out a large node – like Threads will be – in a complex, distributed system that’s as decentralized and heterogeneous as the Fediverse is not something anybody really has done before.”

And:

“When we think of privacy risks when Meta connects to the Fediverse, we usually think of what happens to data that moves from today’s Fediverse into Meta. I didn’t realize the opposite is also quite a challenge (personal data posted to Threads, making its way into the Fediverse) for an organization as heavily monitored by regulators around the world as is Meta.”

Tags: facebook, social-media, mastodon, fediverse, meta

Announcing Purple Llama: Towards open trust and safety in the new world of generative AI

2023-12-08T06:36:26+00:00

Announcing Purple Llama: Towards open trust and safety in the new world of generative AI

New from Meta AI, Purple Llama is “an umbrella project featuring open trust and safety tools and evaluations meant to level the playing field for developers to responsibly deploy generative AI models and experiences”.

There are three components: a 27 page “Responsible Use Guide”, a new open model called Llama Guard and CyberSec Eval, “a set of cybersecurity safety evaluations benchmarks for LLMs”.

Disappointingly, despite this being an initiative around trustworthy LLM development,prompt injection is mentioned exactly once, in the Responsible Use Guide, with an incorrect description describing it as involving “attempts to circumvent content restrictions”!

The Llama Guard model is interesting: it’s a fine-tune of Llama 2 7B designed to help spot “toxic” content in input or output from a model, effectively an openly released alternative to OpenAI’s moderation API endpoint.

The CyberSec Eval benchmarks focus on two concepts: generation of insecure code, and preventing models from assisting attackers from generating new attacks. I don’t think either of those are anywhere near as important as prompt injection mitigation.

My hunch is that the reason prompt injection didn’t get much coverage in this is that, like the rest of us, Meta’s AI research teams have no idea how to fix it yet!

Via @simonw

Tags: facebook, security, ai, prompt-injection, generative-ai, llms, meta, llm-release

Seamless Communication

2023-12-01T17:01:37+00:00

Seamless Communication

A new “family of AI research models” from Meta AI for speech and text translation. The live demo is particularly worth trying—you can record a short webcam video of yourself speaking and get back the same video with your speech translated into another language.

The key to it is the new SeamlessM4T v2 model, which supports 101 languages for speech input, 96 Languages for text input/output and 35 languages for speech output. SeamlessM4T-Large v2 is a 9GB file, available on Hugging Face.

Also in this release: SeamlessExpressive, which “captures certain underexplored aspects of prosody such as speech rate and pauses”—effectively maintaining things like expressed enthusiasm across languages.

Plus SeamlessStreaming, “a model that can deliver speech and text translations with around two seconds of latency”.

Via facebookresearch/seamless_communication

Tags: facebook, transformers, translation, ai, llms

Quoting Ross Taylor

2023-11-15T01:15:15+00:00

[On Meta's Galactica LLM launch] We did this with a 8 person team which is an order of magnitude fewer people than other LLM teams at the time.

We were overstretched and lost situational awareness at launch by releasing demo of a base model without checks. We were aware of what potential criticisms would be, but we lost sight of the obvious in the workload we were under.

One of the considerations for a demo was we wanted to understand the distribution of scientific queries that people would use for LLMs (useful for instruction tuning and RLHF). Obviously this was a free goal we gave to journalists who instead queried it outside its domain. But yes we should have known better.

We had a “good faith” assumption that we’d share the base model, warts and all, with four disclaimers about hallucinations on the demo - so people could see what it could do (openness). Again, obviously this didn’t work.

— Ross Taylor

Tags: facebook, llms, ai, generative-ai, hallucinations

I’m banned for life from advertising on Meta. Because I teach Python.

2023-10-19T14:56:05+00:00

I’m banned for life from advertising on Meta. Because I teach Python.

If accurate, this describes a nightmare scenario of automated decision making.

Reuven recently found he had a permanent ban from advertising on Facebook. They won’t tell him exactly why, and have marked this as a final decision that can never be reviewed.

His best theory (impossible for him to confirm) is that it’s because he tried advertising a course on Python and Pandas a few years ago which was blocked because a dumb algorithm thought he was trading exotic animals!

The worst part? An appeal is no longer possible because relevant data is only retained for 180 days and so all of the related evidence has now been deleted.

Various comments on Hacker News from people familiar with these systems confirm that this story likely holds up.

Via Hacker News

Tags: ethics, facebook, pandas, python, ai, meta, ai-ethics, ai-misuse

Meta in Myanmar, Part I: The Setup

2023-09-30T02:27:25+00:00

Meta in Myanmar, Part I: The Setup

The first in a series by Erin Kissane explaining in detail exactly how things went so incredibly wrong with Facebook in Myanmar, contributing to a genocide ending hundreds of thousands of lives. This is an extremely tough read.

Tags: ethics, facebook, social-media, meta

MMS Language Coverage in Datasette Lite

2023-05-22T20:01:41+00:00

MMS Language Coverage in Datasette Lite

I converted the HTML table of 4,021 languages supported by Meta’s new Massively Multilingual Speech models to newline-delimited JSON and loaded it into Datasette Lite. Faceting by Language Family is particularly interesting—the top five families represented are Niger-Congo with 1,019, Austronesian with 609, Sino-Tibetan with 288, Indo-European with 278 and Afro-Asiatic with 222.

Tags: facebook, ai, datasette, datasette-lite

Introducing speech-to-text, text-to-speech, and more for 1,100+ languages

2023-05-22T19:22:38+00:00

Introducing speech-to-text, text-to-speech, and more for 1,100+ languages

New from Meta AI: Massively Multilingual Speech. “MMS supports speech-to-text and text-to-speech for 1,107 languages and language identification for over 4,000 languages. [...] Some of these, such as the Tatuyo language, have only a few hundred speakers, and for most of these languages, no prior speech technology exists.”

It’s licensed CC-BY-NC 4.0 though, so it’s not available for commercial use.

“In a like-for-like comparison with OpenAI’s Whisper, we found that models trained on the Massively Multilingual Speech data achieve half the word error rate, but Massively Multilingual Speech covers 11 times more languages.”

The training data was mostly sourced from audio Bible translations.

Via Hacker News

Tags: facebook, translation, ai, training-data

ImageBind

2023-05-09T19:04:10+00:00

ImageBind

New model release from Facebook/Meta AI research: “An approach to learn a joint embedding across six different modalities—images, text, audio, depth, thermal, and IMU (inertial measurement units) data”. The non-interactive demo shows searching audio starting with an image, searching images starting with audio, using text to retrieve images and audio, using image and audio to retrieve images (e.g. a barking sound and a photo of a beach to get dogs on a beach) and using audio as input to an image generator.

Tags: facebook, ai, generative-ai, embeddings

Large language models are having their Stable Diffusion moment

2023-03-11T19:15:37+00:00

The open release of the Stable Diffusion image generation model back in August 2022 was a key moment. I wrote how Stable Diffusion is a really big deal at the time.

People could now generate images from text on their own hardware!

More importantly, developers could mess around with the guts of what was going on.

The resulting explosion in innovation is still going on today. Most recently, ControlNet appears to have leapt Stable Diffusion ahead of Midjourney and DALL-E in terms of its capabilities.

It feels to me like that Stable Diffusion moment back in August kick-started the entire new wave of interest in generative AI - which was then pushed into over-drive by the release of ChatGPT at the end of November.

That Stable Diffusion moment is happening again right now, for large language models - the technology behind ChatGPT itself.

This morning I ran a GPT-3 class language model on my own personal laptop for the first time!

AI stuff was weird already. It's about to get a whole lot weirder.

LLaMA

Somewhat surprisingly, language models like GPT-3 that power tools like ChatGPT are a lot larger and more expensive to build and operate than image generation models.

The best of these models have mostly been built by private organizations such as OpenAI, and have been kept tightly controlled - accessible via their API and web interfaces, but not released for anyone to run on their own machines.

These models are also BIG. Even if you could obtain the GPT-3 model you would not be able to run it on commodity hardware - these things usually require several A100-class GPUs, each of which retail for $8,000+.

This technology is clearly too important to be entirely controlled by a small group of companies.

There have been dozens of open large language models released over the past few years, but none of them have quite hit the sweet spot for me in terms of the following:

Easy to run on my own hardware
Large enough to be useful - ideally equivalent in capabilities to GPT-3
Open source enough that they can be tinkered with

This all changed yesterday, thanks to the combination of Facebook's LLaMA model and llama.cpp by Georgi Gerganov.

Here's the abstract from the LLaMA paper:

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

It's important to note that LLaMA isn't fully "open". You have to agree to some strict terms to access the model. It's intended as a research preview, and isn't something which can be used for commercial purposes.

In a totally cyberpunk move, within a few days of the release, someone submitted this PR to the LLaMA repository linking to an unofficial BitTorrent download link for the model files!

So they're in the wild now. You may not be legally able to build a commercial product on them, but the genie is out of the bottle. That furious typing sound you can hear is thousands of hackers around the world starting to dig in and figure out what life is like when you can run a GPT-3 class model on your own hardware.

llama.cpp

LLaMA on its own isn't much good if it's still too hard to run it on a personal laptop.

Enter Georgi Gerganov.

Georgi is an open source developer based in Sofia, Bulgaria (according to his GitHub profile). He previously released whisper.cpp, a port of OpenAI's Whisper automatic speech recognition model to C++. That project made Whisper applicable to a huge range of new use cases.

He's just done the same thing with LLaMA.

Georgi's llama.cpp project had its initial release yesterday. From the README:

The main goal is to run the model using 4-bit quantization on a MacBook.

4-bit quantization is a technique for reducing the size of models so they can run on less powerful hardware. It also reduces the model sizes on disk - to 4GB for the 7B model and just under 8GB for the 13B one.

It totally works!

I used it to run the 7B LLaMA model on my laptop last night, and then this morning upgraded to the 13B model - the one that Facebook claim is competitive with GPT-3.

Here are my detailed notes on how I did that - most of the information I needed was already there in the README.

As my laptop started to spit out text at me I genuinely had a feeling that the world was about to change, again.

I thought it would be a few more years before I could run a GPT-3 class model on hardware that I owned. I was wrong: that future is here already.

Is this the worst thing that ever happened?

I'm not worried about the science fiction scenarios here. The language model running on my laptop is not an AGI that's going to break free and take over the world.

But there are a ton of very real ways in which this technology can be used for harm. Just a few:

Generating spam
Automated romance scams
Trolling and hate speech
Fake news and disinformation
Automated radicalization (I worry about this one a lot)

Not to mention that this technology makes things up exactly as easily as it parrots factual information, and provides no way to tell the difference.

Prior to this moment, a thin layer of defence existed in terms of companies like OpenAI having a limited ability to control how people interacted with those models.

Now that we can run these on our own hardware, even those controls are gone.

How do we use this for good?

I think this is going to have a huge impact on society. My priority is trying to direct that impact in a positive direction.

It's easy to fall into a cynical trap of thinking there's nothing good here at all, and everything generative AI is either actively harmful or a waste of time.

I'm personally using generative AI tools on a daily basis now for a variety of different purposes. They've given me a material productivity boost, but more importantly they have expanded my ambitions in terms of projects that I take on.

I used ChatGPT to learn enough AppleScript to ship a new project in less than an hour just last week!

I'm going to continue exploring and sharing genuinely positive applications of this technology. It's not going to be un-invented, so I think our priority should be figuring out the most constructive possible ways to use it.

What to look for next

Assuming Facebook don't relax the licensing terms, LLaMA will likely end up more a proof-of-concept that local language models are feasible on consumer hardware than a new foundation model that people use going forward.

The race is on to release the first fully open language model that gives people ChatGPT-like capabilities on their own devices.

Quoting Stable Diffusion backer Emad Mostaque:

Wouldn't be nice if there was a fully open version eh

It's happening already...

I published this article on Saturday 11th March 2023. On Sunday, Artem Andreenko got it running on a RaspberryPi with 4GB of RAM:

I've sucefully runned LLaMA 7B model on my 4GB RAM Raspberry Pi 4. It's super slow about 10sec/token. But it looks we can run powerful cognitive pipelines on a cheap hardware. pic.twitter.com/XDbvM2U5GY
- Artem Andreenko 🇺🇦 (@miolini) March 12, 2023

Then on Monday, Anish Thite got it working on a Pixel 6 phone (at 26s/token):

@ggerganov's LLaMA works on a Pixel 6!

LLaMAs been waiting for this, and so have I pic.twitter.com/JjEhdzJ2B9
- anishmaxxing (@thiteanish) March 13, 2023

And then a research lab at Stanford released Alpaca - an instruction fine-tuned version of the model. I wrote more about that in a follow-up post: Stanford Alpaca, and the acceleration of on-device large language model development.

Follow my work

Everything I write on my blog goes out in my Atom feed, and I have a very active Mastodon account, plus a Twitter account (@simonw) where I continue to post links to new things I've written.

I'm also starting a newsletter at simonw.substack.com. I plan to send out everything from my blog on a weekly basis, so if email is your preferred way to stay up-to-date you can subscribe there.

More stuff I've written

My Generative AI tag has everything, but here are some relevant highlights from the past year:

A Datasette tutorial written by GPT-3 - 31 May 2022
How to use the GPT-3 language model - 5 Jun 2022
First impressions of DALL-E, generating images from text - 23 Jun 2022
Using GPT-3 to explain how code works - 9 Jul 2022
Stable Diffusion is a really big deal - 29 Aug 2022
Exploring the training data behind Stable Diffusion - 5 Sep 2022
Prompt injection attacks against GPT-3 - 12 Sep 2022
A tool to run caption extraction against online videos using Whisper and GitHub Issues/Actions - 30 Sep 2022
Is the AI spell-casting metaphor harmful or helpful? - 5 Oct 2022
A new AI game: Give me ideas for crimes to do - 4 Dec 2022
AI assisted learning: Learning Rust with ChatGPT, Copilot and Advent of Code - 5 Dec 2022
How to implement Q&A against your documentation with GPT3, embeddings and Datasette - 13 Jan 2023
Bing: "I will not harm you unless you harm me first" - 15 Feb 2023
I talked about Bing and tried to explain language models on live TV! - 19 Feb 2023
In defense of prompt engineering - 21 Feb 2023
Thoughts and impressions of AI-assisted search from Bing - 24 Feb 2023
Weeknotes: NICAR, and an appearance on KQED Forum - 7 Mar 2023
ChatGPT can't access the internet, even though it really looks like it can - 10 Mar 2023

Tags: facebook, ai, gpt-3, openai, generative-ai, chatgpt, whisper, llama, local-llms, llms, llama-cpp, llm-release, georgi-gerganov

Running LLaMA 7B on a 64GB M2 MacBook Pro with llama.cpp

2023-03-11T04:19:43+00:00

Running LLaMA 7B on a 64GB M2 MacBook Pro with llama.cpp

I got Facebook’s LLaMA 7B to run on my MacBook Pro using llama.cpp (a “port of Facebook’s LLaMA model in C/C++”) by Georgi Gerganov. It works! I’ve been hoping to run a GPT-3 class language model on my own hardware for ages, and now it’s possible to do exactly that. The model itself ends up being just 4GB after applying Georgi’s script to “quantize the model to 4-bits”.

Tags: facebook, ai, gpt-3, generative-ai, llama, local-llms, llms, llama-cpp, georgi-gerganov

Introducing LLaMA: A foundational, 65-billion-parameter large language model

2023-02-24T17:34:18+00:00

Introducing LLaMA: A foundational, 65-billion-parameter large language model

From the paper: “For instance, LLaMA-13B outperforms GPT-3 on most benchmarks, despite being 10× smaller. We believe that this model will help democratize the access and study of LLMs, since it can be run on a single GPU.”

Via @guillaumelample

Tags: facebook, ai, gpt-3, generative-ai, llama, llms

Exploring 10m scraped Shutterstock videos used to train Meta's Make-A-Video text-to-video model

2022-09-29T19:31:24+00:00

Make-A-Video is a new "state-of-the-art AI system that generates videos from text" from Meta AI. It looks incredible - it really is DALL-E / Stable Diffusion for video. And it appears to have been trained on 10m video preview clips scraped from Shutterstock.

I built a new search engine to explore those ten million clips:

https://webvid.datasette.io/webvid/videos

This is similar to the system I built with Andy Baio a few weeks ago to explore the LAION data used to train Stable Diffusion.

Make-A-Video training data

Meta AI's paper describing the model includes this section about the training data:

Datasets. To train the image models, we use a 2.3B subset of the dataset from (Schuhmann et al.) where the text is English. We filter out sample pairs with NSFW images 2, toxic words in the text, or images with a watermark probability larger than 0.5.

We use WebVid-10M (Bain et al., 2021) and a 10M subset from HD-VILA-100M (Xue et al., 2022) 3 to train our video generation models. Note that only the videos (no aligned text) are used.

The decoder Dt and the interpolation model is trained on WebVid-10M. SRt l is trained on both WebVid-10M and HD-VILA-10M. While prior work (Hong et al., 2022; Ho et al., 2022) have collected private text-video pairs for T2V generation, we use only public datasets (and no paired text for videos). We conduct automatic evaluation on UCF-101 (Soomro et al., 2012) and MSR-VTT (Xu et al., 2016) in a zero-shot setting.

That 2.3B subset of images is the same LAION data I explored previously.

HD-VILA-100M was collected by Microsoft Research Asia - Andy Baio notes that these were scraped from YouTube.

I decided to take a look at the WebVid-10M data.

WebVid-10M

The WebVid-10M site describes the data like this:

WebVid-10M is a large-scale dataset of short videos with textual descriptions sourced from the web. The videos are diverse and rich in their content.

The accompanying paper provides a little bit more detail:

We scrape the web for a new dataset of videos with textual description annotations, called WebVid-2M. Our dataset consists of 2.5M video-text pairs, which is an order of magnitude larger than existing video captioning datasets (see Table 1).

The data was scraped from the web following a similar procedure to Google Conceptual Captions [55] (CC3M). We note that more than 10% of CC3M images are in fact thumbnails from videos, which motivates us to use such video sources to scrape a total of 2.5M text-video pairs. The use of data collected for this study is authorised via the Intellectual Property Office’s Exceptions to Copyright for Non-Commercial Research and Private Study.

I'm presuming that Web-10M is a larger version of the WebVid-2M dataset described in the paper.

Most importantly though, the website includes a link to a 2.7GB CSV file - results_10M_train.csv - containing the full WebVid-10M dataset. The CSV file looks like this:

videoid,contentUrl,duration,page_dir,name
21179416,https://ak.picdn.net/shutterstock/videos/21179416/preview/stock-footage-aerial-shot-winter-forest.mp4,PT00H00M11S,006001_006050,Aerial shot winter forest
5629184,https://ak.picdn.net/shutterstock/videos/5629184/preview/stock-footage-senior-couple-looking-through-binoculars-on-sailboat-together-shot-on-red-epic-for-high-quality-k.mp4,PT00H00M29S,071501_071550,"Senior couple looking through binoculars on sailboat together. shot on red epic for high quality 4k, uhd, ultra hd resolution."

I loaded it into SQLite and started digging around.

It's all from Shutterstock!

The big surprise for me when I started exploring the data was this: every single one of the 10,727,582 videos linked in the Datasette started with the same URL prefix:

https://ak.picdn.net/shutterstock/videos/

They're all from Shutterstock. The paper talks about "scraping the web", but it turns out there was only one scraped website involved.

Here's that first row from the CSV file on Shutterstock itself:

https://www.shutterstock.com/video/clip-21179416-aerial-shot-winter-forest

As far as I can tell, the training set used here isn't even full Shutterstock videos: it's the free, watermarked preview clips that Shutterstock makes available.

I guess Shutterstock have really high quality captions for their videos, perfect for training a model on.

Implementation notes

My simonw/webvid-datasette repository contains the code I used to build the Datasette instance.

I built a SQLite database with full-text search enabled using sqlite-utils. I deployed it directly to Fly by building a Docker image that bundled the 2.5G SQLite database, taking advantage of the Baked Data architectural pattern.

The most interesting custom piece of implementation is the plugin I wrote to add a video player to each result. Here's the implementation of that plugin:

from datasette import hookimpl
from markupsafe import Markup

TEMPLATE = """
<video controls width="400" preload="none" poster="{poster}">
  <source src="{url}" type="video/mp4">
</video>
<p>{filename}<br>On <a href="https://www.shutterstock.com/video/clip-{id}">Shutterstock</a></p>
""".strip()
VIDEO_URL = "https://ak.picdn.net/shutterstock/videos/{id}/preview/{filename}"
POSTER_URL = "https://ak.picdn.net/shutterstock/videos/{id}/thumb/1.jpg?ip=x480"


@hookimpl
def render_cell(row, column, value):
    if column != "filename":
        return
    id = row["id"]
    url = VIDEO_URL.format(id=id, filename=value)
    poster = POSTER_URL.format(id=id)
    return Markup(TEMPLATE.format(url=url, poster=poster, filename=value, id=id))

I'm using the new render_cell(row) argument added in Datasette 0.62.

The plugin outputs a <video> element with preload="none" to avoid the browser downloading the video until the user clicks play (see this TIL). I set the poster attribute to a thumbnail image from Shutterstock.

Tags: ethics, facebook, machine-learning, projects, ai, datasette, generative-ai, training-data, text-to-video, ai-ethics

Quoting Ryan Broderick

2022-02-05T22:41:31+00:00

Every few weeks, someone on Twitter notices how demented the content on Facebook is. I’ve covered a lot of these stories. The quick TL;DR is that Facebook’s video section is essentially run by a network of magicians and Vegas stage performers who hack the platform’s algorithm with surreal low-value content designed to distract users long enough to trigger an in-video advertisement and anger them enough to leave a comment.

— Ryan Broderick

Tags: facebook

Quoting Ryan Broderick

2021-10-25T20:22:01+00:00

But this much is clear: Facebook knew all along. Their own employees were desperately trying to get anyone inside the company to listen as their products radicalized their own friends and family members. And as they were breaking the world, they had an army of spokespeople publicly and privately gaslighting and intimidating reporters and researchers who were trying to ring the alarm bell. They knew all along and they simply did not give a shit.

— Ryan Broderick

Tags: facebook

I saw millions compromise their Facebook accounts to fuel fake engagement

2021-06-09T15:40:12+00:00

I saw millions compromise their Facebook accounts to fuel fake engagement

Sophie Zhang, ex-Facebook, describes how millions of Facebook users have signed up for “autolikers”—programs that promise likes and engagement for their posts, in exchange for access to their accounts which are then combined into the larger bot farm and used to provide likes to other posts. “Self-compromise was a widespread problem, and possibly the largest single source of existing inauthentic activity on Facebook during my time there. While actual fake accounts can be banned, Facebook is unwilling to disable the accounts of real users who share their accounts with a bot farm.”

Tags: facebook, social-media