帶你回顧 2024 年電腦科學領域的重大突破 (2024's Biggest Breakthroughs in Computer Science)

Subtitles section Play video

Since ChatGPT launched in 2022, large language models have progressed at a rapid pace, often developing unpredictable abilities.

自 ChatGPT 於 2022 年推出以來，大型語言模型的發展日新月異，往往能開發出難以預測的能力。
When GPT-4 came out, it clearly felt like the chatbot has some level of understanding.

當 GPT-4 問世時，我們明顯感覺到哈拉機器人有了一定程度的理解能力。
But do these abilities reflect actual understanding?

但是，這些能力是否反映了實際的理解能力呢？
Or are the models simply repeating their training data, like so-called stochastic parrots?

或者，這些模型只是在重複它們的訓練數據，就像所謂的隨機鸚鵡？
Recently, researchers from Princeton and Google DeepMind created a mathematically provable argument for how language models develop so many skills.

最近，來自普林斯頓大學和谷歌 DeepMind 的研究人員為語言模型如何發展出如此多的技能提出了一個數學上可證明的論點。
And designed a method for testing them.

並設計了一種測試方法。
The results suggest that the largest models develop new skills in a way that hints at understanding.

結果表明，最大的模型以一種暗示理解的方式發展新技能。
Language models are basically trained to solve next word prediction tasks.

語言模型基本上是為解決下一個單詞預測任務而訓練的。
So they are given a lot of text and at every step it has some idea of what the next word is.

是以，他們得到了大量的文字，每一步都能知道下一個單詞是什麼。
And that idea is expressed in terms of a probability.

而這種想法是用概率來表示的。
And if the next word didn't get high enough probability, there's a slight adjustment that's done.

如果下一個單詞的概率不夠高，就會稍作調整。
And after many, many, many trillions of such small adjustments, it learns to predict the next word.

經過無數、無數、無數萬億次這樣的微小調整後，它學會了預測下一個詞。
Over time, researchers have observed neural scaling laws, an empirical relationship between the performance of language models and the data used to train them.

隨著時間的推移，研究人員觀察到了神經縮放規律，即語言模型的性能與用於訓練模型的數據之間的經驗關係。
As models improve, they minimize training loss or make fewer errors.

隨著模型的改進，它們會最大限度地減少訓練損失或錯誤。
This sudden increase in performance produces new behaviors, a phenomenon called emergence.

性能的突然提升會產生新的行為，這種現象被稱為 "湧現"。
There's no scientific explanation as to why that's happening.

至於為什麼會出現這種情況，目前還沒有科學的解釋。
So this phenomenon is not well understood.

是以，人們對這一現象並不十分了解。
The researchers wondered if GPT-4 sudden improvements could be explained by emergence.

研究人員想知道，GPT-4 的突然改善是否可以用出現來解釋。
Perhaps the model had learned compositional generalization, the ability to combine language skills.

也許該模型已經學會了構圖概括能力，即綜合語言技能的能力。
This was some kind of a meta capability.

這是一種元能力。
There was no mathematical framework to think about that.

沒有數學框架來思考這個問題。
And so we had to come up with a mathematical framework.

是以，我們必須想出一個數學框架。
The researchers found their first hint by considering neural scaling laws.

研究人員通過考慮神經縮放規律找到了第一個提示。
So those scaling laws already suggest that there's some statistical phenomenon going on.

是以，這些縮放定律已經表明存在某種統計現象。
So random graphs have a long history in terms of thinking about statistical phenomena.

是以，隨機圖形在統計現象的思考方面有著悠久的歷史。
Random graphs are made of nodes which are connected by randomly generated edges.

隨機圖由隨機生成的邊連接的節點組成。
The researchers built their mathematical model with bipartite graphs, which contain two types of nodes, one representing chunks of text and the other language skills.

研究人員用雙向圖建立了數學模型，雙向圖包含兩類節點，一類代表文本塊，另一類代表語言技能。
The edges of the graph, the connections correspond to which skill is needed to understand that piece of text.

圖中的邊，即連接對應於理解該段文字所需的技能。
Now, the researchers needed to connect these bipartite graphs to actual language models.

現在，研究人員需要將這些雙向圖與實際的語言模型連接起來。
But there was a problem.

但有一個問題。
Don't have access to the training data.

無法獲取訓練數據。
So if I'm evaluating that language model on my evaluation set, how do I know that the language model hasn't seen that data into the training corpus?

那麼，如果我在我的評估集上評估語言模型，我怎麼知道語言模型沒有在訓練語料庫中看到這些數據呢？
There was one crucial piece of information that the researchers could access.

研究人員可以獲得一個關鍵資訊。
Using that scaling law, we made a prediction as models get better at predicting the next word, that they will be able to combine more of the underlying skills.

利用這一縮放定律，我們預測，隨著模型預測下一個單詞的能力越來越強，它們將能夠結合更多的基本技能。
According to random graph theory, every combination arises from a random sampling of possible skills.

根據隨機圖論，每種組合都來自於對可能技能的隨機抽樣。
If there are 100 skill nodes in the graph and you want to combine four skills, then there are about 100 to the fourth power or 100 million ways to combine them.

如果圖中有 100 個技能節點，而你想組合四種技能，那麼就有大約 100 的四次方或一億種組合方法。
The researchers developed a test called SkillMix to evaluate if large language models can generalize to combinations of skills they likely hadn't seen before.

研究人員開發了一個名為 "SkillMix "的測試，以評估大型語言模型是否能歸納出他們以前可能從未見過的技能組合。
So the model is given a list of skills and a topic, and then it's supposed to create a piece of text on that topic using that list of skills.

是以，我們會給模型一個技能列表和一個主題，然後讓它使用該技能列表就該主題創作一篇文章。
For example, the researchers asked GPT-4 to generate a short text about sewing that exhibits spatial reasoning, self-serving bias and metaphor.

例如，研究人員要求 GPT-4 生成一篇關於縫紉的短文，該短文表現出空間推理、自我服務偏見和隱喻。
Here's what it answered.

它是這樣回答的
In the labyrinth of sewing, I am the needle navigating between the intricate weaves.

在縫紉的迷宮中，我是穿梭於錯綜複雜的編織之間的針。
Any errors are due to the faulty compass of low quality thread, not my skill.

任何錯誤都是由於低質量線的指南針出了問題，與我的技術無關。
We showed in our mathematical framework that as we scale up, the model is able to learn these skills.

我們在數學框架中表明，隨著規模的擴大，模型能夠學習這些技能。
You would see this increase in compositional capability as you scale up the models.

隨著模型規模的擴大，合成能力也會隨之提高。
When given the SkillMix test, small language models struggled to combine just a couple of skills.

在進行 SkillMix 測試時，小語言模型很難將幾種技能結合起來。
Medium-sized models could combine two skills more comfortably, but the largest models, like GPT-4, could combine five or six skills.

中型機型可以比較舒適地組合兩種技能，但最大的機型，如 GPT-4 可以組合五或六種技能。
Because these models couldn't have seen all possible combinations of skills, the researchers argue that it must have developed compositional generalization through emergence.

由於這些模型不可能看到所有可能的技能組合，研究人員認為，它一定是通過湧現發展出了組合概括能力。
Once the model has learned these language skills, a model can generalize to random, unseen compositions of these skills.

一旦模型學會了這些語言技能，它就能將這些技能歸納為隨機的、未見過的組合。
What they showed was that their mathematical model had this property of compositionality, and that by itself gives this ability to extrapolate and compose new combinations from existing pieces.

他們所展示的是，他們的數學模型具有這種組合性的特性，而這種特性本身就賦予了從現有作品中推斷和組成新組合的能力。
And that is really the hallmark of novelty and the hallmark of creativity.

這正是新穎性和創造性的標誌。
And so the argument is that large language models can move beyond being stochastic parents.

是以，有觀點認為，大型語言模型可以超越隨機父模型。
The researchers are already working to extend the SkillMix evaluation to other domains as part of a larger effort to understand the capabilities of large language models.

研究人員已經在努力將 SkillMix 評估擴展到其他領域，作為了解大型語言模型能力的更大努力的一部分。
Can we create an ecosystem of SkillMix, which is not just valid for language skills, but mathematical skills as well as coding skills?

我們能否創建一個技能混合生態系統，它不僅適用於語言技能，還適用於數學技能和編碼技能？
So SkillMix was one example where we made a prediction by just mathematical thinking, and that was correct.

是以，SkillMix 就是我們通過數學思維做出預測的一個例子，而且預測是正確的。
But there are all kinds of other phenomena that we probably are not aware of, and we need some understanding of that.

但是，我們可能還沒有意識到其他各種現象，我們需要對此有所瞭解。
Quantum systems are some of the most complex structures in nature.

量子系統是自然界最複雜的結構之一。
To model them, you need to compute a Hamiltonian, a super equation that describes how particles interact locally to produce the system's possible physical properties.

要對它們進行建模，需要計算哈密頓方程，這是一個描述粒子如何在局部相互作用以產生系統可能的物理特性的超級方程。
But entanglement spreads information across the system, correlating particles that are far apart.

但糾纏能在整個系統中傳播資訊，將相距甚遠的粒子關聯起來。
This makes computing Hamiltonians exceptionally difficult.

這使得計算漢密爾頓方程異常困難。
You have a giant system of atoms.

你有一個巨大的原子系統。
It's a very big problem to learn all those parameters.

學習所有這些參數是一個非常大的問題。
You could never hope to write down the Hamiltonian.

你永遠不可能寫出漢密爾頓方程。
If you ever even tried to write it down, the game would be over and you wouldn't have an efficient algorithm.

如果你想把它寫下來，遊戲就結束了，你也不會有一個有效的算法。
People were actually trying to prove that efficient algorithms were impossible in this regime.

實際上，人們試圖證明，在這種制度下，高效算法是不可能的。
But a team of computer scientists from MIT and UC Berkeley cracked the problem.

但麻省理工學院和加州大學伯克利分校的計算機科學家團隊破解了這一難題。
They created an algorithm that can produce the Hamiltonian of a quantum system at any constant temperature.

他們創建了一種算法，可以在任何恆定溫度下生成量子系統的哈密頓。
The results could have big implications for the future of quantum computing and understanding exotic quantum behavior.

這些成果可能會對未來的量子計算和理解奇異的量子行為產生重大影響。
So when we have systems that behave and do interesting things like superfluidity and superconductivity, you want to understand the building blocks and how they fit together to create those properties that you want to harness for technological reasons.

是以，當我們的系統具有超流性和超導性等有趣的行為和特性時，你就會想要了解這些構件，以及它們是如何組合在一起，從而創造出這些你想要利用的技術特性的。
So we're trying to learn this object, which is the Hamiltonian.

是以，我們試圖學習這個對象，也就是漢密爾頓。
It's defined by a small set of parameters.

它是由一小組參數定義的。
And what we're trying to do is learn these parameters.

我們要做的就是學習這些參數。
What we have access to is these experimental measurements of the quantum system.

我們所能獲得的，就是這些量子系統的實驗測量結果。
So the question then becomes, can you learn a description of the system through experiments?

那麼問題就來了，你能通過實驗學會對系統的描述嗎？
Previous efforts in Hamiltonian learning produced algorithms that could measure particles at high temperatures.

之前在哈密頓學習方面所做的努力產生了可以在高溫下測量粒子的算法。
But these systems are largely classical, so there's no entanglement between the particles.

但這些系統在很大程度上是經典的，是以粒子之間不存在糾纏。
The MIT and Berkeley team set their sights on the low temperature quantum regimes.

麻省理工學院和伯克利大學的團隊將目光投向了低溫量子態。
I wanted to understand what kinds of strategies worked algorithmically on the classical side and what could be manifestations of those strategies on the quantum side.

我想了解，在經典方面，哪些策略在算法上行之有效，而在量子方面，這些策略又有哪些表現形式。
Once you look at the problem in the right way and you bring to bear these tools, it turns out that you can really make progress on these problems.

一旦你以正確的方式看待問題，並利用這些工具，就會發現你真的可以在這些問題上取得進展。
First, the team ported over a tool from classical machine learning called polynomial optimization.

首先，團隊從經典機器學習中移植了一種名為多項式優化的工具。
This allowed them to approximate the measurements of their system as a family of polynomial equations.

這樣，他們就能將系統的測量結果近似為多項式方程組。
We were like, maybe we can write Hamiltonian learning as a polynomial optimization problem.

我們想，也許我們可以把哈密頓學習寫成一個多項式優化問題。
And if we manage to do this, maybe we can try to optimize this polynomial system efficiently.

如果我們能做到這一點，也許就能嘗試有效優化這個多項式系統。
So all of a sudden, it's in a domain that's more familiar and you have a bunch of algorithmic tools at your disposal.

是以，突然之間，它就進入了一個更熟悉的領域，而且你還可以使用大量算法工具。
You can't solve polynomial systems, but what you can do is you can sort of solve a relaxation of them.

你無法求解多項式系統，但你能做的是求解多項式系統的鬆弛。
We use something called the sum of squares relaxation to actually solve this polynomial system.

我們使用一種叫做平方和鬆弛法的方法來實際求解這個多項式系統。
Starting with a challenging polynomial optimization problem, the team used the sum of squares method to relax its constraints.

從一個具有挑戰性的多項式優化問題開始，研究小組使用平方和法來放鬆其約束條件。
This expanded the equations to a larger allowable set of solutions, effectively converting it from a hard problem to an easier one.

這就將方程擴展到了更大的可允許解集，有效地將難題轉化為易題。
The real trick is to argue that when you've expanded the set of solutions, you can still find a good solution inside it.

真正的訣竅在於，當你擴大瞭解的集合時，你仍然可以在其中找到一個好的解。
You need a procedure to take that approximate relaxed solution and round it back into an actual solution to the problem you really cared about.

你需要一個程序，把這個近似的寬鬆解法，繞回你真正關心的問題的實際解法。
So that's really where the coolest parts of the proof happen.

是以，這才是最酷的證明部分。
The researchers proved that the sum of squares relaxation could solve their learning problem, resulting in the first efficient Hamiltonian algorithm in a low temperature regime.

研究人員證明，平方和鬆弛可以解決他們的學習問題，從而首次在低溫條件下實現了高效的哈密頓算法。
So we first make some set of measurements of the macroscopic properties of the system, and then we use these measurements to set up a system of polynomial equations.

是以，我們首先要對系統的宏觀特性進行一系列測量，然後利用這些測量結果建立多項式方程組。
And then we solve the system of polynomial equations.

然後我們求解多項式方程組。
So the output is a description of the local interactions in the system.

是以，輸出結果就是對系統中局部相互作用的描述。
There are actually some very interesting learning problems that are at the heart of understanding quantum systems.

實際上，有一些非常有趣的學習問題是理解量子系統的核心。
And to me, that was the most exciting part was really a connection between two different worlds.

對我來說，最令人興奮的部分就是兩個不同世界之間的聯繫。
This combination of tools is really interesting and something I haven't seen before.

這種工具組合非常有趣，是我以前從未見過的。
I'm hoping it's like a useful perspective with which to tackle other questions as well.

我希望這也能成為解決其他問題的有用視角。
I think we find ourselves at the start of this new bridge between theoretical computer science and quantum mechanics.

我認為，我們正處於理論計算機科學與量子力學之間這座新橋樑的起點。