Placeholder Image

Subtitles section Play video

  • Big data is an elusive concept.

    巨量資料是一種難以理解的觀念。 (譯註:又稱大數據。)

  • It represents an amount of digital information,

    它代表數位資料的量,

  • which is uncomfortable to store,

    它大到難以儲存、

  • transport,

    傳輸、

  • or analyze.

    或分析。

  • Big data is so voluminous

    巨量數據非常龐大,

  • that it overwhelms the technologies of the day

    以至於今日的科技無法處理它

  • and challenges us to create the next generation

    並促使我們來研發新一代的

  • of data storage tools and techniques.

    資料儲存設備以及科技。

  • So, big data isn't new.

    所以,巨量資料並不是什麼新東西。

  • In fact, physicists at CERN have been wrangling

    事實上,CERN 的物理學家已經面對這個

  • with the challenge of their ever-expanding big data for decades.

    資料不斷擴張的挑戰好幾十年了。

  • Fifty years ago, CERN's data could be stored

    五十年前,CERN 的資料可以儲存在

  • in a single computer.

    單一一臺電腦裡。

  • OK, so it wasn't your usual computer,

    當然,它不是我們一般的電腦,

  • this was a mainframe computer

    而是一臺大型電腦,

  • that filled an entire building.

    它塞滿了一整棟房子。

  • To analyze the data,

    如果要分析資料,

  • physicists from around the world traveled to CERN

    物理學家們會從世界各地飛到 CERN

  • to connect to the enormous machine.

    來使用這臺巨大的機器。

  • In the 1970's, our ever-growing big data

    在 1970 年代,我們那不斷擴張的資料

  • was distributed across different sets of computers,

    被分配在好幾組不同的電腦中,

  • which mushroomed at CERN.

    這些電腦在 CERN 裡,如雨後春筍般地出現。

  • Each set was joined together

    每組電腦只用自製的專用網路相連結。

  • in dedicated, homegrown networks.

    每組電腦只用自製的專用網路相連結。

  • But physicists collaborated without regard

    但是科學家的合作關係

  • for the boundaries between sets,

    並不侷限在單一組電腦中,

  • hence needed to access data on all of these.

    所以他們必須能夠在 所有電腦上運用這些資料。

  • So, we bridged the independent networks together

    所以我們把各獨立的網路橋接在一起,

  • in our own CERNET.

    成了我們的 CERNET。

  • In the 1980's, islands of similar networks

    在 1980 年代,一群一群類似這樣的網路

  • speaking different dialects

    在歐洲及美國各地湧現,

  • sprung up all over Europe and the States,

    它們都用不同的方言,

  • making remote access possible but torturous.

    這讓遠端連接變為可能,卻也令人折騰。

  • To make it easy for our physicists across the world

    為了讓我們散佈在世界各地的物理學家

  • to access the ever-expanding big data

    不用四處奔波就能得到存取在 CERN

  • stored at CERN without traveling,

    那不斷更新的資料,

  • the networks needed to be talking

    這個網路系統就必須使用

  • with the same language.

    同一種語言。

  • We adopted the fledgling internet working standard from the States,

    我們採用了美國那不成熟的標準系統,

  • followed by the rest of Europe,

    之後歐洲其餘單位也接受了,

  • and we established the principal link at CERN

    接著在 1989 年,我們在 CERN 建立了

  • between Europe and the States in 1989,

    歐洲和美國的主要連線,

  • and the truly global internet took off!

    這個正式的全球網路終於起飛了。

  • Physicists could easily then access

    物理學家可以輕鬆地

  • the terabytes of big data

    從世界各地

  • remotely from around the world,

    存取到好幾 TB 的巨量資料,

  • generate results,

    產生結果,

  • and write papers in their home institutes.

    然後在自家的研究機構中撰寫論文。

  • Then, they wanted to share their findings

    接著,他們想要和他們的同事

  • with all their colleagues.

    分享他們的結果。

  • To make this information sharing easy,

    為了讓資料分享更容易,

  • we created the web in the early 1990's.

    我們在 1990 年代早期建構了一個網路。

  • Physicists no longer needed to know

    物理學家不再須要先知道

  • where the information was stored

    資料是儲存在哪裡

  • in order to find it and access it on the web,

    然後才能存取資料,

  • an idea which caught on across the world

    一個傳遍世界的想法

  • and has transformed the way we communicate

    改變了我們日常通訊的方式。

  • in our daily lives.

    改變了我們日常通訊的方式。

  • During the early 2000's,

    在 2000 年代早期,

  • the continued growth of our big data

    我們這個愈變愈大的巨量資料

  • outstripped our capability to analyze it at CERN,

    超過了我們 CERN 能夠處理的能力,

  • despite having buildings full of computers.

    除非所有空間都塞滿電腦。

  • We had to start distributing the petabytes of data

    我們必須開始將這好幾 PB 的資料 (譯註:PB = 1,024 TB。)

  • to our collaborating partners

    分配儲存在我們的合作伙伴那,

  • in order to employ local computing and storage

    這樣才有辦法利用各地上百個不同機構的

  • at hundreds of different institutes.

    計算儲存資源。

  • In order to orchestrate these interconnected resources

    為了要讓這些錯綜複雜的資源 在各地不同的系統中

  • with their diverse technologies,

    能協調運作,

  • we developed a computing grid,

    我們發展了一套計算網格,

  • enabling the seamless sharing

    讓世界各地的計算資源

  • of computing resources around the globe.

    得以無縫地分享。

  • This relies on trust relationships and mutual exchange.

    這要依靠彼此的信賴關係以及資源交換。

  • But this grid model could not be transferred

    但這個網格模型沒辦法簡單地

  • out of our community so easily,

    移轉出我們這個群體,

  • where not everyone has resources to share

    因為不是所有人都有資源可以分享

  • nor could companies be expected

    而各公司之間也沒辦法

  • to have the same level of trust.

    被期望能有相同層級的信賴。

  • Instead, an alternative, more business-like approach

    取而代之的是,針對存取須求的資源,

  • for accessing on-demand resources

    有一個商業取向的替代方案

  • has been flourishing recently,

    近期正在蓬勃發展,

  • called cloud computing,

    它叫做雲端計算,

  • which other communities are now exploiting

    有些其它的群體正利用它

  • to analyzing their big data.

    來分析它們的巨量資料。

  • It might seem paradoxical for a place like CERN,

    這對於 CERN 這個地方來說, 聽起來可能有點衝突,

  • a lab focused on the study

    一個專注於研究物質的極小構成要素的實驗室,

  • of the unimaginably small building blocks of matter,

    一個專注於研究物質的 極小構成要素的實驗室

  • to be the source of something as big as big data.

    竟然是這樣巨量資料的來源。

  • But the way we study the fundamental particles,

    但是我們研究基本粒子

  • as well as the forces by which they interact,

    以及它們的交互作用力的方法,

  • involves creating them fleetingly,

    包含了在瞬間產生這些粒子、

  • colliding protons in our accelerators

    在我們的加速器中碰撞質子、

  • and capturing a trace of them

    以及在它們以近光速運動時

  • as they zoom off near light speed.

    捕捉他們的軌跡。

  • To see those traces,

    要見到這些軌跡,

  • our detector, with 150 million sensors,

    我們的偵測器, 包含了一億五千萬個感應器,

  • acts like a really massive 3-D camera,

    像是一個非常巨大的 3-D 攝影機,

  • taking a picture of each collision event -

    記錄每一次碰撞

  • that's up to 14 millions times per second.

    ──這可能會高到每秒一千四百萬次。

  • That makes a lot of data.

    這會產生大量的數據。

  • But if big data has been around for so long,

    但是如果巨量資料已經存在這麼久了,

  • why do we suddenly keep hearing about it now?

    為什麼我們現在才不斷聽到它?

  • Well, as the old metaphor explains,

    這個嘛,就像一個古老的比喻所說的,

  • the whole is greater than the sum of its parts,

    整體強過它所有部份的總和,

  • and this is no longer just science that is exploiting this.

    而已經不再只有科學在開發這塊。

  • The fact that we can derive more knowledge

    我們可以藉由連結相關的資訊

  • by joining related information together

    以及開發合作關係來增長知識,

  • and spotting correlations

    而這項事實

  • can inform and enrich numerous aspects of everyday life,

    可以滋潤並強化 日常生活中的許多部份,

  • either in real time,

    無論是在即時資訊中,

  • such as traffic or financial conditions,

    比如交通或是財政狀況;

  • in short-term evolutions,

    或在短期的演化上,

  • such as medical or meteorological,

    比如醫學或是天氣學;

  • or in predictive situations,

    或是在預測情勢上,

  • such as business, crime, or disease trends.

    有商業、犯罪、或是疾病趨勢。

  • Virtually every field is turning to gathering big data,

    實際上每個領域都 漸漸開始搜集巨量資料,

  • with mobile sensor networks spanning the globe,

    像是跨越全球的行動裝置網路、

  • cameras on the ground and in the air,

    地面及空中的攝影機、

  • archives storing information published on the web,

    儲存發表在網路上的資訊的資料庫、

  • and loggers capturing the activities

    以及記載各地網民活動

  • of Internet citizens the world over.

    的記錄器。

  • The challenge is on to invent new tools and techniques

    這個挑戰在於要 發明一項新的工具以及技術

  • to mine these vast stores,

    來儲存這大量的資料、

  • to inform decision making,

    來為決策提供資訊,

  • to improve medical diagnosis,

    來改進醫學診斷、

  • and otherwise to answer needs and desires

    以及回應一些今日沒想過的

  • of tomorrow's society in ways that are unimagined today.

    明日社會的需求與渴望。

Big data is an elusive concept.

巨量資料是一種難以理解的觀念。 (譯註:又稱大數據。)

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it