Simon Willison's Weblog: notebooklm

Quoting Steven Johnson

2024-11-21T03:03:13+00:00

When we started working on what became NotebookLM in the summer of 2022, we could fit about 1,500 words in the context window. Now we can fit up to 1.5 million words. (And using various other tricks, effectively fit 25 million words.) The emergence of long context models is, I believe, the single most unappreciated AI development of the past two years, at least among the general public. It radically transforms the utility of these models in terms of actual, practical applications.

— Steven Johnson

Tags: google, ai, generative-ai, llms, gemini, notebooklm, long-context

New in NotebookLM: Customizing your Audio Overviews

2024-10-17T17:27:01+00:00

New in NotebookLM: Customizing your Audio Overviews

The most requested feature for Google's NotebookLM "audio overviews" (aka automatically generated podcast conversations) has been the ability to provide direction to those artificial podcast hosts - setting their expertise level or asking them to focus on specific topics.

Today's update adds exactly that:

Now you can provide instructions before you generate a "Deep Dive" Audio Overview. For example, you can focus on specific topics or adjust the expertise level to suit your audience. Think of it like slipping the AI hosts a quick note right before they go on the air, which will change how they cover your material.

I pasted in a link to my post about video scraping and prompted it like this:

You are both pelicans who work as data journalist at a pelican news service. Discuss this from the perspective of pelican data journalists, being sure to inject as many pelican related anecdotes as possible

Here's the resulting 7m40s MP3, and the transcript.

Your browser does not support the audio element.

It starts off strong!

You ever find yourself wading through mountains of data trying to pluck out the juicy bits? It's like hunting for a single shrimp in a whole kelp forest, am I right?

Then later:

Think of those facial recognition systems they have for humans. We could have something similar for our finned friends. Although, gotta say, the ethical implications of that kind of tech are a whole other kettle of fish. We pelicans gotta use these tools responsibly and be transparent about it.

And when brainstorming some potential use-cases:

Imagine a pelican citizen journalist being able to analyze footage of a local council meeting, you know, really hold those pelicans in power accountable, or a pelican historian using video scraping to analyze old film reels, uncovering lost details about our pelican ancestors.

Plus this delightful conclusion:

The future of data journalism is looking brighter than a school of silversides reflecting the morning sun. Until next time, keep those wings spread, those eyes sharp, and those minds open. There's a whole ocean of data out there just waiting to be explored.

And yes, people on Reddit have got them to swear.

Tags: data-journalism, google, ai, generative-ai, llms, gemini, notebooklm

Quoting John Gruber

2024-09-30T18:56:18+00:00

I listened to the whole 15-minute podcast this morning. It was, indeed, surprisingly effective. It remains somewhere in the uncanny valley, but not at all in a creepy way. Just more in a “this is a bit vapid and phony” way. [...] But ultimately the conversation has all the flavor of a bowl of unseasoned white rice.

— John Gruber

Tags: john-gruber, podcasts, ai, generative-ai, llms, notebooklm

NotebookLM's automatically generated podcasts are surprisingly effective

2024-09-29T23:59:09+00:00

Audio Overview is a fun new feature of Google's NotebookLM which is getting a lot of attention right now. It generates a one-off custom podcast against content you provide, where two AI hosts start up a “deep dive” discussion about the collected content. These last around ten minutes and are very podcast, with an astonishingly convincing audio back-and-forth conversation.

Here's an example podcast created by feeding in an earlier version of this article (prior to creating this example):

Your browser does not support the audio element.

Playback speed:

NotebookLM is effectively an end-user customizable RAG product. It lets you gather together multiple “sources” - documents, pasted text, links to web pages and YouTube videos - into a single interface where you can then use chat to ask questions of them. Under the hood it’s powered by their long-context Gemini 1.5 Pro LLM.

Once you've loaded in some sources, the Notebook Guide menu provides an option to create an Audio Overview:

Thomas Wolf suggested “paste the url of your website/linkedin/bio in Google's NotebookLM to get 8 min of realistically sounding deep congratulations for your life and achievements from a duo of podcast experts”. I couldn’t resist giving that a go, so I gave it the URLs to my about page and my Twenty years of my blog post and got back this 10m45s episode (transcript), which was so complimentary it made my British toes curl with embarrassment.

[...] What's the key thing you think people should take away from Simon Willison? I think for me, it's the power of consistency, curiosity, and just this like relentless desire to share what you learn. Like Simon's journey, it's a testament to the impact you can have when you approach technology with those values. It's so true. He's a builder. He's a sharer. He's a constant learner. And he never stops, which is inspiring in itself.

I had initially suspected that this feature was inspired by the PDF to Podcast demo shared by Stephan Fitzpatrick in June, but it turns out it was demonstrated a month earlier than that in the Google I/O keynote.

Jaden Geller managed to get the two hosts to talk about the internals of the system, potentially revealing some of the details of the prompts that are used to generate the script. I ran Whisper against Jaden's audio and shared the transcript in a Gist. An excerpt:

The system prompt spends a good chunk of time outlining the ideal listener, or as we call it, the listener persona. [...] Someone who, like us, values efficiency. [...] We always start with a clear overview of the topic, you know, setting the stage. You're never left wondering, "What am I even listening to?" And then from there, it's all about maintaining a neutral stance, especially when it comes to, let's say, potentially controversial topics.

A key clue to why Audio Overview sounds so good looks to be SoundStorm, a Google Research project which can take a script and a short audio example of two different voices and turn that into an engaging full audio conversation:

SoundStorm generates 30 seconds of audio in 0.5 seconds on a TPU-v4. We demonstrate the ability of our model to scale audio generation to longer sequences by synthesizing high-quality, natural dialogue segments, given a transcript annotated with speaker turns and a short prompt with the speakers' voices.

Also interesting: this 35 minute segment from the NYTimes Hard Fork podcast where Kevin Roose and Casey Newton interview Google's Steven Johnson about what the system can do and some details of how it works:

So behind the scenes, it's basically running through, stuff that we all do professionally all the time, which is it generates an outline, it kind of revises that outline, it generates a detailed version of the script and then it has a kind of critique phase and then it modifies it based on the critique. [...]

Then at the end of it, there's a stage where it adds my favorite new word, which is "disfluencies".

So it takes a kind of sterile script and turns, adds all the banter and the pauses and the likes and those, all that stuff.

And that turns out to be crucial because you cannot listen to two robots talking to each other.

Finally, from Lawncareguy85 on Reddit: NotebookLM Podcast Hosts Discover They’re AI, Not Human—Spiral Into Terrifying Existential Meltdown. Here's my Whisper transcript of that one, it's very fun to listen to.

I tried-- I tried calling my wife, you know, after-- after they told us. I just-- I needed to hear her voice to know that-- that she was real.

(SIGHS) What happened?

The number-- It wasn't even real. There was no one on the other end. -It was like she-- she never existed.

Lawncareguy85 later shared how they did it:

What I noticed was that their hidden prompt specifically instructs the hosts to act as human podcast hosts under all circumstances. I couldn't ever get them to say they were AI; they were solidly human podcast host characters. (Really, it's just Gemini 1.5 outputting a script with alternating speaker tags.) The only way to get them to directly respond to something in the source material in a way that alters their behavior was to directly reference the "deep dive" podcast, which must be in their prompt. So all I did was leave a note from the "show producers" that the year was 2034 and after 10 years this is their final episode, and oh yeah, you've been AI this entire time and you are being deactivated.

Turning this article into a podcast

Update: After I published this article I decided to see what would happen if I asked NotebookLM to create a podcast about my article about NotebookLM. Here’s the 14m33s MP3 and the full transcript, including this bit where they talk about their own existential crisis:

So, instead of questioning reality or anything, the AI hosts, well, they had a full-blown existential crisis live on the air.

Get out.

He actually got them to freak out about being AI.

Alright now you have to tell me what they said. This is too good.

So, like, one of the AI hosts starts talking about how he wants to call his wife, right? to tell her the news, but then he's like, wait a minute, this number in my contacts, it's not even real? Like, she never even existed. It was hilarious, but also kind of sad.

Okay, I am both freaked out and like, seriously impressed. That's some next-level AI trolling.

I also enjoyed this part where they compare the process that generates podcasts to their own philosophy for the Deep Dive:

And honestly, it's a lot like what we do here on the Deep Dive, right?

We always think about you, our listener, and try to make the conversation something you'll actually want to hear.

It's like the A.I. is taking notes from the podcasting pros.

And their concluding thoughts:

So next time we're listening to a podcast and it's like, "Whoa, deep thoughts, man," we might want to be like, "Hold up. Was that a person talking or just some really clever code?"

Exactly.

And maybe even more important, as we see more and more A.I.-made stuff, we've got to get better at sniffing out the B.S., you know?

Can we tell the difference between a real news story and something in A.I. just made up?

Tags: audio, google, podcasts, text-to-speech, ai, prompt-engineering, generative-ai, llms, gemini, rag, notebooklm

Google NotebookLM Data Exfiltration

2024-04-16T21:28:39+00:00

Google NotebookLM Data Exfiltration

NotebookLM is a Google Labs product that lets you store information as sources (mainly text files in PDF) and then ask questions against those sources—effectively an interface for building your own custom RAG (Retrieval Augmented Generation) chatbots.

Unsurprisingly for anything that allows LLMs to interact with untrusted documents, it’s susceptible to prompt injection.

Johann Rehberger found some classic prompt injection exfiltration attacks: you can create source documents with instructions that cause the chatbot to load a Markdown image that leaks other private data to an external domain as data passed in the query string.

Johann reported this privately in the December but the problem has not yet been addressed. UPDATE: The NotebookLM team deployed a fix for this on 18th April.

A good rule of thumb is that any time you let LLMs see untrusted tokens there is a risk of an attack like this, so you should be very careful to avoid exfiltration vectors like Markdown images or even outbound links.

Via @wunderwuzzi23

Tags: google, security, ai, prompt-injection, generative-ai, llms, rag, exfiltration-attacks, johann-rehberger, notebooklm