ChatGPT Is the World’s Best Chatbot

I added a cherry on top (last section)

Dec 02, 2022

OpenAI has released ChatGPT, a new dialogue language model (LM) based on the GPT-3.5 family series (trained on text and code) and similar to InstructGPT (aligned with reinforcement learning through human feedback). The company set up an online demo and people are losing their minds over it.

In a nutshell, ChatGPT is a chatbot that can “answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests."

This nicely encapsulates the reason why ChatGPT is so special: “admit”, “challenge", and “reject" are unusual verbs to describe the behavior of an LM. However, it isn’t an exaggeration in ChatGPT’s case (countless examples that I’ll share soon assert it).

ChatGPT is, by far, the best chatbot in the world. It can write essays and poetry. It can find great prompts for AI art models. It can roleplay. It can write code, find a bug, explain it, solve it, and explain the solution. And it can combine ideas in the weirdest ways possible:

Riley Goodside @goodside

OpenAI's new ChatGPT explains the worst-case time complexity of the bubble sort algorithm, with Python code examples, in the style of a fast-talkin' wise guy from a 1940's gangster movie:

The model’s superior abilities and better alignment than, say, baseline GPT-3, make it feel more human. In turn, this makes it more believable — although it doesn’t necessarily imply it’s more reliable, and thus trustable.

Like all other LMs (e.g. GPTs, Galactica, LaMDA) it makes things up, can generate hurtful completions, and produce misinformation. None of those deficiencies have changed significantly (ChatGPT is better but it’s built from the same principles).

Gary Marcus @GaryMarcus

favorite #GPT fail of the day? My top 4 so far (poll in next tweet); please add your own

But that’s not my focus today. I won’t be annoying you with another cautionary tale on why we shouldn’t trust these models or a how-to piece on critical thinking.

In this article, I’ll share with you a compilation of the most interesting findings and implications people have dug out of ChatGPT (with my added commentary, of course).

And, to put the cherry on top, I’ll take you on a journey. I want to explore a hypothetical: what would happen if AI models became so good at hiding the imperfections that we could no longer find any shortcomings or deficiencies in them?

The wild implications of ChatGPT

In case you haven’t checked Twitter lately, people have spent the last two days talking to ChatGPT non-stop. I’m going to review their findings and conclusions. Once you see what I’m going to show you, you’ll understand why the above hypothetical isn’t so crazy after all.

Essays are dead

I agree 100% that essays, as a form of evaluation, will be dead soon. I’ve written about this before — and about how neither teachers nor the education system are ready for this. With ChatGPT this is now a generally accepted statement:

Corry Wang @corry_wang

I guess GPT-3 is old news, but playing with OpenAI’s new chatbot is mindblowing. chat.openai.com/chat We’re witnessing the death of the college essay in realtime. Here’s the response to a prompt from one of my 200-level history classes at Amherst Solid A- work in 10 seconds

It became obvious it would happen after students started to cheat on their assignments with GPT-3 and teachers realized they had to prepare. Now it’s a tangible reality. I wrote a 1500-word essay about the top five AI predictions for 2023, and all seemed highly plausible.

I have to say, however, that insightful, engaging, innovative, or thought-provoking aren’t the best adjectives to describe ChatGPT’s creations. Much of its output is dull (which is inevitable unless you really try to get a memorable piece, or two), repetitive, or not accurate — when not utter nonsense.

What worries me — beyond the reforms the education system will need — is whether we’ll ever be capable again of recognizing human-made written work. LMs may get so good as to completely blur the gap between them and us. So much so that not even an AI discriminator (GAN-style) would be able to find which is which because there may not be a difference.

However, there’s another possibility: human writing has characteristics that can, using the right tools, reveal authorship. As LMs become masters of prose, they may develop some kind of writing idiosyncrasy (as a feature and not a bug).

Maybe we could find the AI’s styleme (like a fingerprint hidden in language) not simply to distinguish ChatGPT from a human, but to distinguish its style from all others.

Is Google dead?

The other grand implication of ChatGPT is that it will “kill” Google — the hegemon of internet search “is done”. No one implies it has already happened or is about to, but it's clearly not a stretch given that people are already using the model to replace Google satisfactorily:

Chris Frantz @frantzfries

The new chatbot from OpenAI is actually insane Replaced Google for work today Now it’s giving great advice on games to play with my toddler tonight I could see not needing to use Google again 💀

But there are a few caveats here.

Google is above OpenAI in terms of research capability, talent, and budget — if anyone can build this tech before OpenAI, it’s them. However, the juggernaut of internet ads is simply too big to adequately react and maneuver. Google’s AI research branch is arguably the best in the world, but they barely ship any products/services anymore.

Google is facing a case of “the innovator’s dilemma:” the company can’t put its main business model in check with risky innovations just because others could eventually dethrone it.

Balaji @balajis

Google ships amazing AI research, but that’s because publishing can’t be easily held back by the rest of the org. To turn from search to synthesis they’d need to take a risk with their cash cow. Revenue might decline in the short run. It’s unlikely today’s Google can do it.

6529 @punk6529

@balajis Aren't they well positioned in AI?

LMs could actually be the first real threat Google has faced in 20 years.

Yet, if we analyze the differences between search engines and LMs, we realize they don’t overlap perfectly.

On the one hand, search engines are rigid. They just go into the internet to find websites and show you a list of links that approximately gives you what you’re looking for—that’s basically the simplest form of internet search. But, on the other hand, they’re reliable. You know they won’t make things up. (Google search, like all others, is biased and may show you fake news, but you can check the sources, which is critical here.)

ChatGPT is much more flexible, but, because its objective isn’t to be factual or truthful, it can make up information as easily as it can give you an amazing, highly convoluted, and precise answer. You never know which one will be a priori and may have a hard time checking afterward (ChatGPT doesn’t give you sources and, if you ask, it could make those up anyway).

Gary Marcus @GaryMarcus

George Hotz “[GPT] is what kills Google” Google: Game on! ——————————————- Below, Round 1: Which country won the most Eurovision contests? Google on left; GPT on right. Not sure I want that much personality in my search results… https://t.co/25XngLyiiV

George Hotz 🐀 @realGeorgeHotz

This is what kills Google. Just needs to be able to use the Internet. https://t.co/tYi7v8ppYE

In short, search engines are much more limited but better equipped for the task.

That said, I don’t think the search engine will survive LMs. Time runs against them—while search engine tech isn’t advancing at all, LMs develop at the speed of light.

As soon as a more robust variation of the transformer architecture appears or companies implement “reliability modules" (whatever that means), LMs will automatically become super generative search engines.

No one would ever use Google again.

ChatGPT is worryingly impressive

Now, I’ll attempt at explaining why the hypothetical I raised in the intro is so important—and will be even more so in the near future.

You’ve already seen some of the many impressive abilities ChatGPT has so now you understand why I take this seriously: ChatGPT is making it harder for people who are fighting the hype to find deficiencies—which doesn’t mean they aren’t there.

Tony Zador @TonyZador

@GaryMarcus @dileeplearning I've been using chatGPT and having a very hard time getting it to say something misleading much less harmful, although it does sometimes say things that are somewhat dense (also, it's completely befuddled by garden path sentences...can't parse them correctly. bug or feature?)

Riley Goodside @goodside

OpenAI’s new ChatGPT appears to defeat Hofstadter/Bender’s list of hallucination-inducing questions, published in The Economist this June to demonstrate the “hollowness” of GPT-3’s understanding of the world: economist.com/by-invitation/…

It’s still quite apparent that ChatGPT lacks reasoning abilities and doesn’t have a great memory window (

Gary Marcus

wrote a great essay on why it “can seem so brilliant one minute and so breathtakingly dumb the next”).

Like Galactica, it makes nonsense sound plausible. People can “easily” pass its filters and it’s susceptible to prompt injections. Obviously, it’s not perfect.

Yet, ChatGPT is a jump forward—a jump toward us being unable to make it trip up by testing it and sampling:

Tony Zador @TonyZador

@GaryMarcus @dileeplearning chatGPT explains it doesnt believe bananas go into the oven, and therefore refused to engage with what it considered a contrafactual premise. This is a bit rigid but given that we are all trying to trip it up this might not be a crazy response.

And this is a big deal.

I wrote an essay on AGI a while ago that I entitled “AGI will take everyone by surprise.” ChatGPT isn’t at that level or anywhere near it (it’s actually just GPT-3 on steroids), but it’s worth it to bring up my arguments on that piece:

“Everything has limits. The Universe has limits — nothing outside the laws of physics can happen, no matter how much we try — and even the infinite — the set of natural numbers is infinite, but it doesn’t contain the set of real numbers.
GPT-3 has limits, and we, the ones trying to find them, also have limits. What Gwern proved [here] was that while looking for GPT-3’s limits, we found ours. It wasn’t GPT-3 that was failing to do some tasks, but us who were unable to find an adequate prompt. Our limits were preventing GPT-3 from performing a task. We were preventing GPT-3 from reaching its true potential.
This raises an immediate question: If the limitations of GPT-3 are often mistaken for ours, how could we precisely define the boundaries of what the system can or can’t do?
…
In the end, we are a limited system trying to evaluate another limited system. Who guarantees that our limits are beyond theirs in every sense? We have a very good example that this may not be the case: We are very bad at assessing our limitations. We keep surprising ourselves with the things we can do, individually and collectively. We keep breaking physical and cognitive limits. Thus, our measurement tools may very well fall short of the action capabilities of a powerful enough AI.”

In his essay, Gwern (a popular tech blogger) pointed out that “sampling can prove the presence of knowledge but not the absence.” He used this idea to defend his thesis that the cause of GPT-3’s failures could be bad prompting and not an inherent lack of “knowledge” on the model’s side.

What I want to underscore here is that the limitations of sampling as a testing methodology don’t apply just in case we find our limits (Gwern argument) or AI’s deficiencies (anti-hype argument), but also if we don’t find any.

When people find deficiencies in ChatGPT’s responses a common counterpoint is “you don’t know how to get the most out of the AI.” That’s fair—but insufficient—because once a systematic deficiency is found, we can conclude the system isn’t reliable.

But, what will happen if, despite numerous attempts to make an AI model break character, beat its filters, and make it drop its facade of reasoning, people fail to do so?

This may seem like a philosophical thought experiment not grounded in reality, but I think it’s quite possible that—applying this reasoning to a future super AI model—we could find the upper limits of the methodologies at hand before finding the AI’s deficiencies.

(I’m not implying the model would actually be able to reason perfectly, but that a set of well-designed guardrails, filters, and intrinsic conservativeness, combined with our limitations as humans, would make it appear so.)

We would have no way to prove that the model can’t reason. No one would believe people who are now using sampling as a way to prove these limitations and everyone would eventually start to trust the system. If we don’t acknowledge this issue soon—and find a solution—it’ll be too late.

If there’s something we should take away from ChatGPT’s superb capabilities is that we’re inevitably approaching this reality.

(None of this was written by ChatGPT.)

4 Comments

Phil Tanny

Dec 3, 2022Liked by Alberto Romero

Very interesting, thanks, appreciate the education on this topic. It's all news to me. I tried to sign up to the demo, but when they demanded my phone number I backed out. Still curious though, so I may give in.

You write...

"But, what will happen if, despite numerous attempts to make an AI model break character, beat its filters, and make it drop its facade of reasoning, people fail to do so?"

Shouldn't we replace the word "if" with the word "when"? That quibble aside, this is the kind of question that interests me.

1) Where is all this going?

2) Do we want to go there?

3) If not, what can we do about it?

There's a lot of speculation about where this is going, but so far I don't see much discussion of #2 and #3. What do you see?

---------

Here's a related topic which perhaps you might be able to comment on?

I have some software (CrazyTalk) which allows you to animate a face photo with an audio file. The end product is a video of a face photo that talks. It's not interactive, you can't have a conversation, only create a video of a face that speaks your script.

Is anybody combining GPT with animated face photos? You know, so instead of a purely text conversation, you could interactively converse with an animated photo of a human?

If such a service were available it seems it would considerably deepen the illusion of consciousness, at least for we in the general public.

Expand full comment

The Algorithmic Bridge