What You May Have Missed #14

ChatGPT: Microsoft deal, monetization, fake scientist, education / GPT-4 info and misinfo / More generative AI news: a robot lawyer, Stable Diffusion lawsuit, AI journalist, RLHF for images, 10x Alexa

Jan 15, 2023

Important news about The Algorithmic Bridge!

From now on—and besides the Sunday WYMHM column—a few articles every month will be exclusive to paid subscribers (around 30-40% will still be free).

I’ve been considering this change for some time now. My main motivations:

I want to give free subscribers a stronger incentive to become exclusive subs, and paid subscribers a stronger reason to remain so. They keep TAB alive for all of us!
I want to dedicate full-time to growing and improving TAB and making it the most interesting and useful resource for people outside of AI to be able to make sense of everything that’s going on inside.
I want it to stay free of ads, sponsorships, and the attention-grabbing clickbait that’s increased 1000x since ChatGPT.

I’m extremely happy for the 237 supportive readers that contribute monthly. If you want to join the group, you can do it here (in case you’re not sure, you can try the 7-day free trial!)

Also, to give free subscribers the option to check out the WYMHM Sunday column and see if they like it, today’s post is fully open. Enjoy!

ChatGPT, ChatGPT, and more ChatGPT

Microsoft-OpenAI $10-billion deal

Semafor reported a week ago that, if everything goes according to the plan, Microsoft will close a $10B investment deal with OpenAI before the end of January.

There’s been some misinformation about the deal which implied that OpenAI execs weren’t sure about the company’s long-term viability. However, it was later clarified that the deal looks like this:

Leo L’Orange, who writes The Neuron, explains that “once $92 billion in profit plus $13 billion in initial investment are repaid to Microsoft and once the other venture investors earn $150 billion, all of the equity reverts back to OpenAI.”

People are divided. Some say the deal is “cool” or “interesting”, whereas others say it’s “odd” and “crazy”. What I perceive with my non-expert eyes is that OpenAI and Sam Altman do trust (some would say overtrust) the company’s long-term ability to achieve its goals.

However, as Will Knight writes for WIRED, “it’s unclear what products can be built on the technology.” OpenAI must figure out a viable business model soon.

Changes to ChatGPT: New features and monetization

OpenAI updated ChatGPT on Jan 9 (from the previous update on Dec 15). Now the chatbot has “improved factuality” and you can stop it mid-generation.

They’re also working on a “professional version of ChatGPT” (unrelated to the API) as OpenAI’s president Greg Brockman announced on Wednesday. These are the three main features:

“Always available (no blackout windows).
Fast responses from ChatGPT (i.e. no throttling).
As many messages as you need (at least 2X regular daily limit).”

To sign up for the waitlist you have to fill out a form where they ask you, among other things, how much you’d be willing to pay (and how much would be too much).

If you plan to take it seriously, you should consider going deep into OpenAI’s product stack with the OpenAI cookbook repo. Bojan Tunguz said it is “the top trending repo on GitHub this month.” Always a good sign.

ChatGPT, the fake scientist

ChatGPT has entered the scientific domain. Kareem Carr posted a screenshot on Thursday of a paper of which ChatGPT is a co-author.

Kareem Carr | Data Scientist @kareem_carr

so i guess we’re doing this now

But why, given that ChatGPT is a tool? “People are starting to treat ChatGPT as if it were a bona fide, well-credentialed scientific collaborator,” explains Gary Marcus in a Substack post. “Scientists, please don’t let your chatbots grow up to be co-authors,” he pleads.

More worrisome are those cases where there’s no disclosure of AI use. Scientists have found that they can’t reliably identify abstracts written by ChatGPT—its eloquent bullshit fools even experts in their field. As Sandra Wachter, who “studies technology and regulation” at Oxford, told Holly Else for a piece on Nature:

“If we’re now in a situation where the experts are not able to determine what’s true or not, we lose the middleman that we desperately need to guide us through complicated topics.”

ChatGPT’s challenge to education

ChatGPT has been banned in education centers all across the globe (e.g. New York public schools, Australian universities, and UK lecturers are thinking about it). As I argued in a previous essay, I don’t think this is the wisest decision but merely a reaction due to being unprepared for the fast development of generative AI.

NYT’s Kevin Roose argues that “[ChatGPT’s] potential as an educational tool outweighs its risks.” Terence Tao, the great mathematician, agrees: “In the long term, it seems futile to fight against this; perhaps what we as lecturers need to do is to move to an “open books, open AI” mode of examination.”

Giada Pistilli, the principal ethicist at Hugging Face, explains the challenge schools face with ChatGPT:

“Unfortunately, the educational system seems forced to adapt to these new technologies. I think it is understandable as a reaction, since not much has been done to anticipate, mitigate or elaborate alternative solutions to contour the possible resulting problems. Disruptive technologies often demand user education because they cannot simply be thrown at people uncontrollably.”

That last sentence captures perfectly where the problem arises and the potential solution lies. We have to make an extra effort to educate users on how this tech works and what’s possible and not to do with them. That’s the approach Catalonia has taken. As Francesc Bracero and Carina Farreras report for La Vanguardia:

“In Catalonia, the Department of Education is not going to prohibit it ‘in the entire system and for everyone, since this would be an ineffective measure.’ According to sources from the ministry, it is better to ask the centers to educate in the use of AI, ‘which can provide a lot of knowledge and advantages.’”

The student’s best friend: A database of ChatGPT errors

Gary Marcus and Ernest Davis have set up an “error tracker” to capture and classify the errors language models like ChatGPT make (here’s more info about why they’re compiling this document and what they plan to do with it).

The database is public and anyone can participate. It’s a great resource that allows for rigorous study of how these models misbehave and how people could avoid misuse. Here’s a hilarious example of why this matters:

cohost.org/sloshy @LiquidSloshalot

Lol. Lmao.

An exchange asking why Google isn't as concise or easy to use as ChatGPT, followed by someone pointing out the ChatGPT answer is wrong. OP says "Deep lesson in this" despite being on chatGPT's side initially

OpenAI is aware of this and wants to fight mis- and disinformation: “Forecasting Potential Misuses of Language Models for Disinformation Campaigns—and How to Reduce Risk.”

GPT-4 info and misinfo

New information

Sam Altman hinted at a delay in GPT-4’s release in a conversation with Connie Loizos, the silicon valley editor at TechCrunch. Altman said that “in general, we are going to release technology much more slowly than people would like. We're going to sit on it for much longer…” This is my take:

Alberto Romero @Alber_RomGar

It seems we'll have to wait a bit longer for GPT-4... Maybe the fact that ChatGPT wasn't ready to be released in the wild combined with its unexpected virality made OpenAI rethink its approach.

Krystal Hu @readkrystalhu

@sama said OpenAI’s GPT-4 will launch only when they can do it safely & responsibly. “In general we are going to release technology much more slowly than people would like. We're going to sit on it for much longer…” Also confirmed video model in the works in convo w/ @Cookie https://t.co/Y0XHHvcqNw

(Altman also said there’s a video model in the works!)

Misinformation about GPT-4

There is a “GPT-4 = 100T” claim going viral everywhere on social media (I've mostly seen it on Twitter and LinkedIn). In case you haven’t seen it, it looks like this:

Alex Hormozi @AlexHormozi

This is a frightening visual for me. The first dot is the amount of data Chat GPT 3 was trained on. The second is what chat GPT 4 is trained on. They are already doing demos. It can write a 60,000 word book from a single prompt. The only question I've had about AI…

Or this:

Simon Høiberg @SimonHoiberg

GPT-4 is going to launch soon. And it will make ChatGPT look like a toy... → GPT-3 has 175 billion parameters → GPT-4 has 100 trillion parameters I think we're gonna see something absolutely mindblowing this time! And the best part? 👇

All are slightly different versions of the same thing: An appealing visual graph that captures attention, and a strong hook with the GPT-4/GPT-3 comparison (they use GPT-3 as a proxy for ChatGPT).

I think sharing rumors and speculations and framing them as such is okay (I feel partly responsible for this), but posting unverifiable info with an authoritative tone and without references is reprehensible.

People doing this aren’t far from being as useless and dangerous as ChatGPT as sources of information—and with much stronger incentives to keep doing it. Beware of this because it’ll pollute every channel of information about AI going forward.

More generative AI news

A robot lawyer

Joshua Browder, CEO at DoNotPay, posted this on Monday:

Joshua Browder @jbrowder1

DoNotPay will pay any lawyer or person $1,000,000 with an upcoming case in front of the United States Supreme Court to wear AirPods and let our robot lawyer argue the case by repeating exactly what it says. (1/2)

As it couldn’t be otherwise, this bold claim generated a lot of debate to the point that Twitter now flags the tweet with a link to the Supreme Court page of prohibited items.

Even if they finally can’t do it due to legal reasons, it’s worth considering the question from an ethical and social standpoint. What happens if the AI system makes a serious mistake? Could people without access to a lawyer benefit from a mature version of this technology?

Litigation against Stable Diffusion has begun

Matthew Butterick published this on Friday:

“On behalf of three wonderful artist plaintiffs—Sarah Andersen, Kelly McKernan, and Karla Ortiz—we’ve filed a class-action lawsuit against Stability AI, DeviantArt, and Midjourney for their use of Stable Diffusion, a 21st-century collage tool that remixes the copyrighted works of millions of artists whose work was used as training data.”

It begins—the first steps in what promises to be a long fight to moderate the training and use of generative AI. I agree with the motivation: “AI needs to be fair & ethical for everyone.”

But, like many others, I’ve found inaccuracies in the blog post. It goes deep into the technicalities of Stable Diffusion but fails to explain correctly some bits. Whether this is intentional as a means to bridge the technical gap for people who don’t know—and don’t have the time to learn—about how this technology works (or as a means to characterize the tech in a way that benefits them) or a mistake is open to speculation.

I argued in my latest article that right now the clash between AI art and traditional artists is strongly emotional. The responses to this lawsuit won’t be any different. We’ll have to wait for the judges to decide the outcome.

CNET publishing AI-generated articles

Futurism reported this on Wednesday:

“CNET, a massively popular tech news outlet, has been quietly employing the help of "automation technology" — a stylistic euphemism for AI — on a new wave of financial explainer articles.”

Gael Breton, who first spotted this, wrote a deeper analysis on Friday. He explains that Google doesn’t seem to be hindering traffic to these posts. “Is AI content ok now?” he asks.

I find it CNET’s decision to fully disclose the use of AI in their articles a good precedent. How many people are publishing content right now using AI without disclosing it? However, the consequence is that people may lose their jobs if this works out (as I, and many others, predicted). It’s already happening:

Jason Colavito @JasonColavito

A client informed me that he will no longer pay me to write content for his website because A.I. can write it for free, but he wants to pay me a fraction of my usual rate to "rewrite it" in different words so it can pass Google's A.I. detection screening.

I fully agree with this Tweet from Santiago:

Santiago @svpino

AI will not replace you. A person using AI will.

RLHF for image generation

If reinforcement learning through human feedback works for language models, why not for text-to-image? That’s what PickaPic is trying to achieve.

Yuval Kirstain @YKirstain

I am super excited to introduce the Pick a Pic community effort in text-to-image generation! pickapic.io Pick a Pic is a completely *open-source* and *collaborative* project to align text-to-image generation with what humans really *want* to see. 🧵 1/3

The demo is for research purposes but could be an interesting addition to Stable Diffusion or DALL-E (Midjourney does something similar—they internally guide the model to output beautiful and artistic images).

A recipe to “make Siri/Alexa 10x better”

Recipes that mix different generative AI models to create some better than the sum of the parts:

Jim Fan @DrJimFan

Here’s the recipe to make Siri/Alexa 10x better: 1. Whisper to convert speech to text. Best open-source speech model out there. 2. ChatGPT to generate smart home API calls and/or text response. 3. VALL-E to synthesize speech. It can mimic anyone’s voice sample! Quick figure 1/3

3 Comments

Charlotte Dune

Charlotte Dune's Lagoon

“It can write a 60,000 word book from a single prompt.”

Ummm 😳

I fear as an indie author that branding and marketing and ad budgets are going to become too important (vs talent) as now big authors are going to be able to churn out a novel a week...

Expand full comment

2 replies by Alberto Romero and others

2 more comments...