Subtitles section Play video
Big data is an elusive concept.
巨量資料是一種難以理解的觀念。 (譯註:又稱大數據。)
It represents an amount of digital information,
它代表數位資料的量,
which is uncomfortable to store,
它大到難以儲存、
transport,
傳輸、
or analyze.
或分析。
Big data is so voluminous
巨量數據非常龐大,
that it overwhelms the technologies of the day
以至於今日的科技無法處理它
and challenges us to create the next generation
並促使我們來研發新一代的
of data storage tools and techniques.
資料儲存設備以及科技。
So, big data isn't new.
所以,巨量資料並不是什麼新東西。
In fact, physicists at CERN have been wrangling
事實上,CERN 的物理學家已經面對這個
with the challenge of their ever-expanding big data for decades.
資料不斷擴張的挑戰好幾十年了。
Fifty years ago, CERN's data could be stored
五十年前,CERN 的資料可以儲存在
in a single computer.
單一一臺電腦裡。
OK, so it wasn't your usual computer,
當然,它不是我們一般的電腦,
this was a mainframe computer
而是一臺大型電腦,
that filled an entire building.
它塞滿了一整棟房子。
To analyze the data,
如果要分析資料,
physicists from around the world traveled to CERN
物理學家們會從世界各地飛到 CERN
to connect to the enormous machine.
來使用這臺巨大的機器。
In the 1970's, our ever-growing big data
在 1970 年代,我們那不斷擴張的資料
was distributed across different sets of computers,
被分配在好幾組不同的電腦中,
which mushroomed at CERN.
這些電腦在 CERN 裡,如雨後春筍般地出現。
Each set was joined together
每組電腦只用自製的專用網路相連結。
in dedicated, homegrown networks.
每組電腦只用自製的專用網路相連結。
But physicists collaborated without regard
但是科學家的合作關係
for the boundaries between sets,
並不侷限在單一組電腦中,
hence needed to access data on all of these.
所以他們必須能夠在 所有電腦上運用這些資料。
So, we bridged the independent networks together
所以我們把各獨立的網路橋接在一起,
in our own CERNET.
成了我們的 CERNET。
In the 1980's, islands of similar networks
在 1980 年代,一群一群類似這樣的網路
speaking different dialects
在歐洲及美國各地湧現,
sprung up all over Europe and the States,
它們都用不同的方言,
making remote access possible but torturous.
這讓遠端連接變為可能,卻也令人折騰。
To make it easy for our physicists across the world
為了讓我們散佈在世界各地的物理學家
to access the ever-expanding big data
不用四處奔波就能得到存取在 CERN
stored at CERN without traveling,
那不斷更新的資料,
the networks needed to be talking
這個網路系統就必須使用
with the same language.
同一種語言。
We adopted the fledgling internet working standard from the States,
我們採用了美國那不成熟的標準系統,
followed by the rest of Europe,
之後歐洲其餘單位也接受了,
and we established the principal link at CERN
接著在 1989 年,我們在 CERN 建立了
between Europe and the States in 1989,
歐洲和美國的主要連線,
and the truly global internet took off!
這個正式的全球網路終於起飛了。
Physicists could easily then access
物理學家可以輕鬆地
the terabytes of big data
從世界各地
remotely from around the world,
存取到好幾 TB 的巨量資料,
generate results,
產生結果,
and write papers in their home institutes.
然後在自家的研究機構中撰寫論文。
Then, they wanted to share their findings
接著,他們想要和他們的同事
with all their colleagues.
分享他們的結果。
To make this information sharing easy,
為了讓資料分享更容易,
we created the web in the early 1990's.
我們在 1990 年代早期建構了一個網路。
Physicists no longer needed to know
物理學家不再須要先知道
where the information was stored
資料是儲存在哪裡
in order to find it and access it on the web,
然後才能存取資料,
an idea which caught on across the world
一個傳遍世界的想法
and has transformed the way we communicate
改變了我們日常通訊的方式。
in our daily lives.
改變了我們日常通訊的方式。
During the early 2000's,
在 2000 年代早期,
the continued growth of our big data
我們這個愈變愈大的巨量資料
outstripped our capability to analyze it at CERN,
超過了我們 CERN 能夠處理的能力,
despite having buildings full of computers.
除非所有空間都塞滿電腦。
We had to start distributing the petabytes of data
我們必須開始將這好幾 PB 的資料 (譯註:PB = 1,024 TB。)
to our collaborating partners
分配儲存在我們的合作伙伴那,
in order to employ local computing and storage
這樣才有辦法利用各地上百個不同機構的
at hundreds of different institutes.
計算儲存資源。
In order to orchestrate these interconnected resources
為了要讓這些錯綜複雜的資源 在各地不同的系統中
with their diverse technologies,
能協調運作,
we developed a computing grid,
我們發展了一套計算網格,
enabling the seamless sharing
讓世界各地的計算資源
of computing resources around the globe.
得以無縫地分享。
This relies on trust relationships and mutual exchange.
這要依靠彼此的信賴關係以及資源交換。
But this grid model could not be transferred
但這個網格模型沒辦法簡單地
out of our community so easily,
移轉出我們這個群體,
where not everyone has resources to share
因為不是所有人都有資源可以分享
nor could companies be expected
而各公司之間也沒辦法
to have the same level of trust.
被期望能有相同層級的信賴。
Instead, an alternative, more business-like approach
取而代之的是,針對存取須求的資源,
for accessing on-demand resources
有一個商業取向的替代方案
has been flourishing recently,
近期正在蓬勃發展,
called cloud computing,
它叫做雲端計算,
which other communities are now exploiting
有些其它的群體正利用它
to analyzing their big data.
來分析它們的巨量資料。
It might seem paradoxical for a place like CERN,
這對於 CERN 這個地方來說, 聽起來可能有點衝突,
a lab focused on the study
一個專注於研究物質的極小構成要素的實驗室,
of the unimaginably small building blocks of matter,
一個專注於研究物質的 極小構成要素的實驗室
to be the source of something as big as big data.
竟然是這樣巨量資料的來源。
But the way we study the fundamental particles,
但是我們研究基本粒子
as well as the forces by which they interact,
以及它們的交互作用力的方法,
involves creating them fleetingly,
包含了在瞬間產生這些粒子、
colliding protons in our accelerators
在我們的加速器中碰撞質子、
and capturing a trace of them
以及在它們以近光速運動時
as they zoom off near light speed.
捕捉他們的軌跡。
To see those traces,
要見到這些軌跡,
our detector, with 150 million sensors,
我們的偵測器, 包含了一億五千萬個感應器,
acts like a really massive 3-D camera,
像是一個非常巨大的 3-D 攝影機,
taking a picture of each collision event -
記錄每一次碰撞
that's up to 14 millions times per second.
──這可能會高到每秒一千四百萬次。
That makes a lot of data.
這會產生大量的數據。
But if big data has been around for so long,
但是如果巨量資料已經存在這麼久了,
why do we suddenly keep hearing about it now?
為什麼我們現在才不斷聽到它?
Well, as the old metaphor explains,
這個嘛,就像一個古老的比喻所說的,
the whole is greater than the sum of its parts,
整體強過它所有部份的總和,
and this is no longer just science that is exploiting this.
而已經不再只有科學在開發這塊。
The fact that we can derive more knowledge
我們可以藉由連結相關的資訊
by joining related information together
以及開發合作關係來增長知識,
and spotting correlations
而這項事實
can inform and enrich numerous aspects of everyday life,
可以滋潤並強化 日常生活中的許多部份,
either in real time,
無論是在即時資訊中,
such as traffic or financial conditions,
比如交通或是財政狀況;
in short-term evolutions,
或在短期的演化上,
such as medical or meteorological,
比如醫學或是天氣學;
or in predictive situations,
或是在預測情勢上,
such as business, crime, or disease trends.
有商業、犯罪、或是疾病趨勢。
Virtually every field is turning to gathering big data,
實際上每個領域都 漸漸開始搜集巨量資料,
with mobile sensor networks spanning the globe,
像是跨越全球的行動裝置網路、
cameras on the ground and in the air,
地面及空中的攝影機、
archives storing information published on the web,
儲存發表在網路上的資訊的資料庫、
and loggers capturing the activities
以及記載各地網民活動
of Internet citizens the world over.
的記錄器。
The challenge is on to invent new tools and techniques
這個挑戰在於要 發明一項新的工具以及技術
to mine these vast stores,
來儲存這大量的資料、
to inform decision making,
來為決策提供資訊,
to improve medical diagnosis,
來改進醫學診斷、
and otherwise to answer needs and desires
以及回應一些今日沒想過的
of tomorrow's society in ways that are unimagined today.
明日社會的需求與渴望。