LLM and Human Brain
Universe, Evolution, and Human Brain
I work in Tech, so algorithms, systems, user needs, product features, growth, these are my alleys.
Outside of Tech, in my spare time, I have always been fascinated by 3 topics: universe, evolution, and human brain. I always treated them as separate topics, but interestingly, I recently read 3 books that tied these topics together:
The Selfish Gene by Richard Dawkins, 1976 (this is a nearly-50-year-old book, but I only read it recently and its content is still very fresh and relevant)
A Thousand Brains: A New Theory of Intelligence by Jeff Hawkins, 2021
On the Origin of Time: Stephen Hawking’s Final Theory by Thomas Hertog, 2023
The following summary doesn’t do the justice for these awesome books, but the (over-)simplification helps set context for this article:
Evolution: The simplest way to evolve is to copy itself, wait for the unpredictable mutations to happen and let natural selection pick which mutations to survive and dominate.
Human Brain: Human brain evolved in a very similar way; ever since the earliest life intelligence appeared, the brain “column” started copying itself and making inter-connections between “columns”. Intelligence level grew quickly (in cosmo terms).
Universe: During Big Bang, the universe also “evolved” in a split second, and formed physics laws as we know today.
As I started to think about my 2nd startup, I went into a deeper look at Generative AI and LLM. This led me to a surprising hypothesis, which will be discussed later in this article. Time will tell whether this hypothesis is true or not.
Brief History of AI
Before I go into the hypothesis, let’s first quickly review the brief history of AI, especially around efforts inspired by the human brain / neural system.
1st AI boom, 1950s - 1970s
Perceptron (single layer neural network) was first primitive attempt to mimic human brain and neural system, and led to first AI boom.
1st AI winter, 1974 - 1980
Computers were too slow for these systems to work in practice, which led to first AI winter.
2nd AI boom, 1980s
Backpropagation technique was introduced, and revived artificial neural network research and development.
2nd AI winter, 1987 - 2000
Expert systems were not practical in real applications, and got abandoned.
3rd AI boom, 2010 - 2017
Deep learning (multi-layer modern neural networks like CNN & RNN), powered by much more CPU / GPU, significantly improved image recognition ability, thus triggering the 3rd AI boom.
ImageNet competition soon reached the state where AI has better accuracy than humans, and the annual competition ended in 2017.
4th AI boom, 2017 - present
Introduction of transformers in 2017 was especially industry revolutionary, and scaling the data / compute / model up increased level of intelligence. Combination of these 2 points led to the current mega-boom of Generative AI.
Note that there was never a “3rd AI winter”, so one can argue the current GenAI boom is just an amplified continuation of the 3rd AI boom. But I would argue calling it “4th AI boom” is more suitable. Next I’ll explain why.
Why Transformers / LLMs Stood Out?
Throughout the history of AI / ML, many algorithms / solutions were developed for different use cases. Speech recognition, image recognition, computer vision, machine translation, natural language processing; all these problem spaces had their own techniques and solutions, and made their own progress. Each topic has their own conferences, and while they do talk to each other and borrow ideas, researchers mostly stay in their own domain lanes.
If we were to go back to 2010, and ask a NLP (natural language processing) expert why best ASR (automatic speech recognition) techniques couldn’t be directly applied in NLP, we’d hear a long list of strong reasons. Vice versa.
What we saw was increased complexity for solutions in each specific domain. That was what the direction of AI progress was back then.
Until the introduction of transformers.
What made the transformer unique was, it used a simple unit (see below image from the original transformer paper "Attention is All you Need"), and a LLM is normally just one or a handful of transformers, with billions of parameters.
What differentiated various LLMs are:
Number of parameters
Training data
What the foundational model does at root of it is also simple: to predict what’s the next token. It’s a common NLP problem.
However, this time, the solution that was built for NLP problem can be easily applied to other domains. We’ve seen transformers that can generate texts, images, videos, etc.
One solution that handles all domains -> that’s the real revolution here. Thus the name AGI, artificial general intelligence.
Similarity with Human Brain
As I mentioned earlier: “Human brain evolved in a very similar way; ever since the earliest life intelligence appeared, the brain ‘column’ started copying itself and making inter-connections between ‘columns’. Intelligence level grew quickly (in cosmo terms).”
Replicating simple structure is how human intelligence grew. Humans are extremely good at learning one area and applying the learnings to another area. Feed our brain with different data, we learn different skills. All humans have relatively the same brain structure; it’s the unique experiences / learnings that allowed us to become experts in different areas.
During human brain evolution, human brain “column” was the simple, easily replicable and scalable, structure, which led to human intelligence.
It seems that during AI evolution, transformers achieved a similar approach: a simple structure, easily replicable and scalable, and when feeding it large amounts of training data, LLMs learned to tackle different problems.
Thus my hypothesis: Transformer-based LLMs is the first AI that shows similarity with how human intelligence evolved.
Before the wrong conclusion gets drawn, I want to explicitly clarify that I’m not suggesting LLMs are mimicking human brains. In fact there are lots of differences:
As a starter, human brains don’t represent anything as a numeric vector, and store numbers in our brain cells. In fact, we have very limited knowledge about how information is stored in our brain.
Human brain uses electric pulses among neural cells as a way of communication (and imaginary association), but the brain can’t even control when and where such an electric pulse is sent.
Probably because of these, human brains are extremely energy efficient.
I’m just excited that during this evolution, we are at the beginning of artificial intelligence coming closer to human intelligence, and we can leverage that in the right way to help improve human lives.