Subtitles section Play video
Six thousand miles of road,
六千英里公路,
600 miles of subway track,
六百英里地鐵路線,
400 miles of bike lanes
四百英里腳踏車專用道,
and a half a mile of tram track,
半英里的有軌電車專用道
if you've ever been to Roosevelt Island.
僅在羅斯福島。
These are the numbers that make up the infrastructure of New York City.
這些數字構成了紐約市的基建。
These are the statistics of our infrastructure.
這些基建的統計數字,
They're the kind of numbers you can find released in reports by city agencies.
都可以在市政機關公佈的報告中找到。
For example, the Department of Transportation will probably tell you
譬如,交通部門可能會告訴你,
how many miles of road they maintain.
他們維護這多少英里的道路。
The MTA will boast how many miles of subway track there are.
MTA(紐約交通運輸管理局)會自誇 他們掌管著多少英里捷運。
Most city agencies give us statistics.
多數的市政機關都在公佈統計數據。
This is from a report this year
這是今年計程車與轎車委員會發佈的報告,
from the Taxi and Limousine Commission,
我們從中知道紐約市運營著 大約一萬三千五百輛計程車。
where we learn that there's about 13,500 taxis here in New York City.
很有趣,是嗎?
Pretty interesting, right?
但你有否想過這些數據來自哪裡?
But did you ever think about where these numbers came from?
既然有這些數字存在, 那肯定是因為在市政機關的某個人
Because for these numbers to exist, someone at the city agency
想過:嗯......這個數字可能有人會想知道。
had to stop and say, hmm, here's a number that somebody might want want to know.
這個數字是市民們想知道的。
Here's a number that our citizens want to know.
所以他們找回那些原始數據,
So they go back to their raw data,
他們計數、相加、計算,
they count, they add, they calculate,
然後把得出的結果寫進報告中,
and then they put out reports,
所以那些報告中會有這樣的數字。
and those reports will have numbers like this.
那麼問題來了:他們怎麼會知道 我們的問題都是什麼?
The problem is, how do they know all of our questions?
我們有很多問題。
We have lots of questions.
事實上,可以說我們有無窮無盡的問題
In fact, in some ways there's literally an infinite number of questions
有關我們這座城市。
that we can ask about our city.
市政機關可無法跟得上(我們的節奏)。
The agencies can never keep up.
現有模式並不具有實效,我覺得 我們的政策制定者也知道這點,
So the paradigm isn't exactly working, and I think our policymakers realize that,
因為在2012年,彭博市長 簽署了一個法令,他稱之為
because in 2012, Mayor Bloomberg signed into law what he called
全美最具雄心和綜合性的 開放數據立法。
the most ambitious and comprehensive open data legislation in the country.
從各種意義上來說,他是對的。
In a lot of ways, he's right.
在過去兩年中,市政有1000個數據庫
In the last two years, the city has released 1,000 datasets
放在我們的開放數據門戶網站上,
on our open data portal,
還是蠻驚人的。
and it's pretty awesome.
我們來檢視這些數據,
So you go and look at data like this,
除了數數計程車的數量,
and instead of just counting the number of cabs,
我們也能開始問不一樣的問題了。
we can start to ask different questions.
我有一個問題:
So I had a question.
紐約市的交通高峰在什麼時候?
When's rush hour in New York City?
這簡直煩人。高峰到底是什麼時候?
It can be pretty bothersome. When is rush hour exactly?
我想到,這些計程車可不僅僅是個數字,
And I thought to myself, these cabs aren't just numbers,
它們可以是開遍全市道路的GPS記錄儀,
these are GPS recorders driving around in our city streets
記錄著乘客的每一差車程。
recording each and every ride they take.
數據是現成的。我檢視它們,
There's data there, and I looked at that data,
並制出一張圖表,標出 一天中紐約市計程車的平均時速。
and I made a plot of the average speed of taxis in New York City throughout the day.
大家可以看到, 從半夜到凌晨五點十八分,
You can see that from about midnight to around 5:18 in the morning,
時速一直在增加,然後到了拐點,
speed increases, and at that point, things turn around,
時速逐漸下降,在早間的八點三十五分,
and they get slower and slower and slower until about 8:35 in the morning,
時速降到十一英里半。
when they end up at around 11 and a half miles per hour.
運營中計程車的平均時速 保持在十一英里半,
The average taxi is going 11 and a half miles per hour on our city streets,
結果沒有變化,
and it turns out it stays that way
整天都是如此。
for the entire day.
(笑聲)
(Laughter)
我告訴自己,紐約市並不存在高峰時段,
So I said to myself, I guess there's no rush hour in New York City.
而是全天都高峰。
There's just a rush day.
這是個有意義的結論,原因有幾點。
Makes sense. And this is important for a couple of reasons.
如果你是做交通規劃的, 知道這個結論會有意義。
If you're a transportation planner, this might be pretty interesting to know.
如果你要快速到達某地,
But if you want to get somewhere quickly,
只要把鬧鐘定在凌晨四點四十五分就行了。
you now know to set your alarm for 4:45 in the morning and you're all set.
紐約嘛!
New York, right?
但這個數據背後還有故事。
But there's a story behind this data.
這個數據並不真的是現成的。
This data wasn't just available, it turns out.
你需要做一個「信息自由法案申請」,
It actually came from something called a Freedom of Information Law Request,
也叫「FOIL申請」。
or a FOIL Request.
你可以在計程車和轎車委員會的網站上 找到相關申請表。
This is a form you can find on the Taxi and Limousine Commission website.
如果要獲得這些數據, 你要弄到這張申請表,
In order to access this data, you need to go get this form,
填好上交,受理人員屆時會通知你。
fill it out, and they will notify you,
一個叫克里斯▪旺的人就這樣做了。
and a guy named Chris Whong did exactly that.
克里斯來到委員會,工作人員告訴他
Chris went down, and they told him,
「帶個全新的硬盤來辦公室,
"Just bring a brand new hard drive down to our office,
我們會把相關數據拷貝給你, 過五小時來拿。」
leave it here for five hours, we'll copy the data and you take it back."
這就是拿到數據的經過。
And that's where this data came from.
克里斯想公開這些數據,
Now, Chris is the kind of guy who wants to make the data public,
於是放到網路上供所有人使用, 所以我才能做出這張圖。
and so it ended up online for all to use, and that's where this graph came from.
這一切——這些GPS記錄儀真是酷。
And the fact that it exists is amazing. These GPS recorders -- really cool.
但是,市民要攜帶自己的移動硬盤
But the fact that we have citizens walking around with hard drives
踏遍市政機關, 然後通過自己的努力公開,這件事——
picking up data from city agencies to make it public --
政府數據可以說是公開的, 普通市民能得到它,
it was already kind of public, you could get to it,
但這只是名義上的「公開」, 並不是真正的公開。
but it was "public," it wasn't public.
我們的城市可以做得更好。
And we can do better than that as a city.
我們不需要費力帶著移動硬盤到處跑。
We don't need our citizens walking around with hard drives.
並不是每一個數據庫都需要FOIL申請。
Now, not every dataset is behind a FOIL Request.
我做的這張地圖標出了紐約市最危險的路口,
Here is a map I made with the most dangerous intersections in New York City
來源是腳踏車騎行者的交通事故數據。
based on cyclist accidents.
紅色區域更危險,
So the red areas are more dangerous.
圖上顯示,首先,曼哈頓的東側,
And what it shows is first the East side of Manhattan,
特別是曼哈頓的下城區域, 腳踏車事故更多。
especially in the lower area of Manhattan, has more cyclist accidents.
這可能是因為,
That might make sense
在這裡有更多的騎行者從大橋下來。
because there are more cyclists coming off the bridges there.
圖上還有其他的熱點區域值得研究。
But there's other hotspots worth studying.
威廉姆斯堡、皇后區的羅斯福大道,
There's Williamsburg. There's Roosevelt Avenue in Queens.
這些咨詢才是Vision Zero項目所需要的。
And this is exactly the kind of data we need for Vision Zero.
這正是我們要找的東西。
This is exactly what we're looking for.
這個數據背後也有個故事。
But there's a story behind this data as well.
這個數據並不是現成的。
This data didn't just appear.
有多少人知道這個符號?
How many of you guys know this logo?
我看到有人點頭了。
Yeah, I see some shakes.
你們有沒有試過從PDF文檔中 拷貝和黏貼數據,
Have you ever tried to copy and paste data out of a PDF
並據此作出結論呢?
and make sense of it?
我看到更多人點頭了。
I see more shakes.
試圖拷貝粘貼的人 比認識這個標誌的人更多,真有趣。
More of you tried copying and pasting than knew the logo. I like that.
你們剛剛看到的數據是做在PDF裡的。
So what happened is, the data that you just saw was actually on a PDF.
事實上,是成千上萬頁的PDF文檔,
In fact, hundreds and hundreds and hundreds of pages of PDF
由我們的紐約警署發佈。
put out by our very own NYPD,
如果你想享用這些數據, 你要不就持續
and in order to access it, you would either have to copy and paste
做複製黏貼的動作,花掉成千上萬小時,
for hundreds and hundreds of hours,
要不就像約翰▪克勞斯一樣。
or you could be John Krauss.
約翰▪克勞斯
John Krauss was like,
可不想重複地去複製黏貼, 他寫了一個程式。
I'm not going to copy and paste this data. I'm going to write a program.
這個程序叫做 「紐約警署交通事故數據OK蹦」,
It's called the NYPD Crash Data Band-Aid,
它能到紐約警署的網站下載PDF文檔,
and it goes to the NYPD's website and it would download PDFs.
每天它都去搜索; 如果找到一個PDF文檔,就下載下來,
Every day it would search; if it found a PDF, it would download it
然後運行某個PDF解碼的程式,
and then it would run some PDF-scraping program,
把其中的文字信息提取出來,
and out would come the text,
其中的訊息會發佈在網路上, 人們就可以製作這些地圖。
and it would go on the Internet, and then people could make maps like that.
這些數據就在那兒,我們都能得到——
And the fact that the data's here, the fact that we have access to it --
每一個交通事故就是一行數據。
Every accident, by the way, is a row in this table.
你們可以想像有多少PDF需要轉碼。
You can imagine how many PDFs that is.
——我們能看到這些數據固然好,
The fact that we have access to that is great,
但能不能不要弄成PDF格式的,
but let's not release it in PDF form,
不然市民們就得去寫PDF解碼的程式,
because then we're having our citizens write PDF scrapers.
這對市民的時間來說是一種浪費,
It's not the best use of our citizens' time,
而我們的城市能做的更好。
and we as a city can do better than that.
有個好消息,白思豪市長的班底
Now, the good news is that the de Blasio administration
在幾個月前公開了這份數據,
actually recently released this data a few months ago,
所以我們能直接享用這些數據,
and so now we can actually have access to it,
然而還有很多數據是PDF格式的。
but there's a lot of data still entombed in PDF.
譬如,我們的罪案數據目前只有PDF格式的。
For example, our crime data is still only available in PDF.
除了罪案數據,市政預算也是如此。
And not just our crime data, our own city budget.
目前我們的市政預算只有PDF格式的。
Our city budget is only readable right now in PDF form.
不僅是我們無法分析這些數字,
And it's not just us that can't analyze it --
那些為市政預算投票的立法委員們
our own legislators who vote for the budget
也只能拿到PDF版本的數字。
also only get it in PDF.
所以我們的立法委員是無法分析 他們要為之投票的市政預算的。
So our legislators cannot analyze the budget that they are voting for.
我認為我們的城市還能做得更好。
And I think as a city we can do a little better than that as well.
很多數據已經不躲在PDF中了。
Now, there's a lot of data that's not hidden in PDFs.
這裡有一幅地圖可以作為例證,
This is an example of a map I made,
標示了紐約市最骯髒的水路。
and this is the dirtiest waterways in New York City.
我是如何衡量「骯髒」的呢?
Now, how do I measure dirty?
這裡有些奇怪,
Well, it's kind of a little weird,
我衡量的是糞便大腸菌群的水平,
but I looked at the level of fecal coliform,
這是水路中糞便物質的一種衡量指標。
which is a measurement of fecal matter in each of our waterways.
圓圈越大,水就越髒,
The larger the circle, the dirtier the water,
所以圖上的大圓圈代表髒水, 小圓圈代表乾淨的水。
so the large circles are dirty water, the small circles are cleaner.
大家看到的是內河水道。
What you see is inland waterways.
這裡有紐約市過去五年採樣的所有數據。
This is all data that was sampled by the city over the last five years.
內河水道總的來說變髒了。
And inland waterways are, in general, dirtier.
這個結論挺合理的,對嗎?
That makes sense, right?
大圓圈代表髒水。 我從中學到了幾件事情。
And the bigger circles are dirty. And I learned a few things from this.
第一:千萬別在任何叫做「xx溪」 或「xx運河」的地方游泳。
Number one: Never swim in anything that ends in "creek" or "canal."
但是第二:紐約市最髒的水路,
But number two: I also found the dirtiest waterway in New York City,
只看(糞便大腸菌群)這個唯一的指標,
by this measure, one measure.
在康尼島溪,幸好不是你們游泳的康尼島。
In Coney Island Creek, which is not the Coney Island you swim in, luckily.
那在島的另一面。
It's on the other side.
但在康尼島溪中, 過去五年的採樣中有94%
But Coney Island Creek, 94 percent of samples taken over the last five years
含有超標的糞便含量,
have had fecal levels so high
以至於達到州法律禁止游泳的水平。
that it would be against state law to swim in the water.
這種類型的事實
And this is not the kind of fact that you're going to see
你可不會在市政報告中看到,不是嗎?
boasted in a city report, right?
這也不會登上紐約市政府網站的頭條。
It's not going to be the front page on nyc.gov.
我們肯定不會看到的,
You're not going to see it there,
但能看到這些數據真實不錯。
but the fact that we can get to that data is awesome.
同樣,拿到這些數據並不容易,
But once again, it wasn't super easy,
因為它們並不在公開數據門戶網站上。
because this data was not on the open data portal.
如果你看公開數據的門戶網站,
If you were to go to the open data portal,
你只能看到其中一些片段, 只有一年內或幾個月的數據。
you'd see just a snippet of it, a year or a few months.
這些數據其實是在環境保護部門的網站上。
It was actually on the Department of Environmental Protection's website.
每一個鏈接都是一個Excel文件, 而每個Excel文件都是不一樣的。
And each one of these links is an Excel sheet, and each Excel sheet is different.
每一個表頭都不同: 需要複製、黏貼、還有重新整理。
Every heading is different: you copy, paste, reorganize.
一旦完成你就能做出這些地圖, 但我要再次重申,
When you do you can make maps and that's great, but once again,
我們的城市能做的更好, 我們可以標準化。
we can do better than that as a city, we can normalize things.
我們正在改善這裡有個 索克拉塔公司建立的網站
And we're getting there, because there's this website that Socrata makes
叫做「紐約市公開數據門戶」。
called the Open Data Portal NYC.
這裡,1100個數據庫
This is where 1,100 data sets that don't suffer
都不存在標準化的問題,
from the things I just told you live,
而且(這些無縫連接的數據庫)數字還在增加。
and that number is growing, and that's great.
你可以下載任一格式的數據: CSV、PDF或Excel文件都可以。
You can download data in any format, be it CSV or PDF or Excel document.
按你自己的需求來下載。
Whatever you want, you can download the data that way.
但問題又來了,
The problem is, once you do,
你會發現不同的機構 用不同的代碼來表示地址。
you will find that each agency codes their addresses differently.
有街道名、有路口名、
So one is street name, intersection street,
行政區、地址、建築物、建築物地址等等。
street, borough, address, building, building address.
所以,即使有這個門戶網站的幫助,
So once again, you're spending time, even when we have this portal,
你還得花時間來標準化地址這塊的數據。
you're spending time normalizing our address fields.
這也不是有效利用市民時間的方法。
And that's not the best use of our citizens' time.
我們的城市能做得更好。
We can do better than that as a city.
我們可以對地址進行標準化,
We can standardize our addresses,
如果做到了, 我們就能做出更多這樣的地圖。
and if we do, we can get more maps like this.
這是紐約市消防龍頭的地圖,
This is a map of fire hydrants in New York City,
但不僅於此。
but not just any fire hydrants.
這些是前250個吃到最多違章停車罰單的 消防栓位置圖。
These are the top 250 grossing fire hydrants in terms of parking tickets.
(笑聲)
(Laughter)
我從圖中學到了幾件事, 我也真的喜歡這張圖。
So I learned a few things from this map, and I really like this map.
第一:別在上東區停車。
Number one, just don't park on the Upper East Side.
千萬別停。因為不管停哪裡都會吃罰單。
Just don't. It doesn't matter where you park, you will get a hydrant ticket.
第二:我找出了全紐約市最最容易 吃到違章停車罰單的兩個消防栓的位置,
Number two, I found the two highest grossing hydrants in all of New York City,
兩個都在下東區,
and they're on the Lower East Side,
每年能在罰單上創收五萬五千多美金。
and they were bringing in over 55,000 dollars a year in parking tickets.
我注意到這點,覺得有些奇怪,
And that seemed a little strange to me when I noticed it,
於是深入挖掘了一下原因, 結果發現消防栓
so I did a little digging and it turns out what you had is a hydrant
都有一個叫做控制擴展的區域,
and then something called a curb extension,
是約有七英呎的一塊地方,可以走路,
which is like a seven-foot space to walk on,
然後是一個停車位。
and then a parking spot.
所以車開過來,司機發現消防栓,
And so these cars came along, and the hydrant --
想“還有一段距離,這裡沒問題的”,
"It's all the way over there, I'm fine,"
何況地上還有一個畫得美美的停車位,
and there was actually a parking spot painted there beautifully for them.
司機停好車,但紐約警署不同意這種配置,
They would park there, and the NYPD disagreed with this designation
開出了罰單。
and would ticket them.
可不只是我本人吃了罰單,
And it wasn't just me who found a parking ticket.
這是谷歌街景拍到的一輛過路車,
This is the Google Street View car driving by
也吃了同樣的一張罰單。
finding the same parking ticket.
於是我把這件事發到自己的部落格上 以及“I Quant NY”上,
So I wrote about this on my blog, on I Quant NY, and the DOT responded,
結果交通部門回復如下:
and they said,
“交通部並未就此地點收到相關投訴,
"While the DOT has not received any complaints about this location,
我們會重新檢視道路標誌, 並做出適當的改善措施。”
we will review the roadway markings and make any appropriate alterations."
我暗自想:多麼官腔!
And I thought to myself, typical government response,
好吧,我該幹嘛幹嘛去了。
all right, moved on with my life.
然而,幾週時間過去, 發生了意料之外的事情。
But then, a few weeks later, something incredible happened.
停車位重新畫了,
They repainted the spot,
那一瞬間我覺得能看到公開數據的未來。
and for a second I thought I saw the future of open data,
大家想想這件事,
because think about what happened here.
過去五年,這個讓人困惑的停車位 一直讓人吃罰單,
For five years, this spot was being ticketed, and it was confusing,
但某一天,一位市民發現了問題 報告市政機關,又過了幾週時間,
and then a citizen found something, they told the city, and within a few weeks
問題車位被修正了。
the problem was fixed.
太不可思議了。很多人認為 公開數據讓市民變成政府的監視者,
It's amazing. And a lot of people see open data as being a watchdog.
並非如此,它實則讓人們成為了合作夥伴。
It's not, it's about being a partner.
市民能夠有底氣成為政府更好的合作夥伴,
We can empower our citizens to be better partners for government,
這並不難。
and it's not that hard.
我們只需要作出一些改變。
All we need are a few changes.
如果我們在申請FOIL信息自由法案數據,
If you're FOILing data,
如果你看到自己申請的數據已經被反覆申請,
if you're seeing your data being FOILed over and over again,
讓我們直接向公眾公開, 因為反覆申請就是需要公開的一种信號。
let's release it to the public, that's a sign that it should be made public.
如果某個政府機關正在發佈PDF數據,
And if you're a government agency releasing a PDF,
讓我們通過法案 要求他們發佈隱藏的數據,
let's pass legislation that requires you to post it with the underlying data,
因為這些數據必定有來源。
because that data is coming from somewhere.
我不知道從哪兒,但肯定有來源,
I don't know where, but it's coming from somewhere,
可以發佈PDF之外的信息。
and you can release it with the PDF.
讓我們運用并分享一些公開數據的標準。
And let's adopt and share some open data standards.
讓我們從紐約本市的地址開始,
Let's start with our addresses here in New York City.
把地址標準化。
Let's just start normalizing our addresses.
因為紐約是公開數據的領導者。
Because New York is a leader in open data.
儘管如此,我們絕對是公開數據的領導者,
Despite all this, we are absolutely a leader in open data,
如果我們開始做標準化的工作, 建立公開數據的標準,
and if we start normalizing things, and set an open data standard,
其他人都會追隨的。州里會、聯邦政府也可能,
others will follow. The state will follow, and maybe the federal government,
我知道這或許聽上去有些瘋狂, 但別的國家也未嘗不會追隨。
Other countries could follow,
我們不久後也許能開發出
and we're not that far off from a time where you could write one program
可以涵蓋100個國家地圖信息的程式。
and map information from 100 countries.
這可不是科幻小說, 而是指日可待的事實。
It's not science fiction. We're actually quite close.
這能幫助誰?
And by the way, who are we empowering with this?
可不單單是約翰▪克勞斯和克里斯▪旺。
Because it's not just John Krauss and it's not just Chris Whong.
紐約城現在正有幾百個聚會在進行,
There are hundreds of meetups going on in New York City right now,
都是活躍的聚會。
active meetups.
這些聚會讓幾千人參與其中。
There are thousands of people attending these meetups.
他們下班後或在週末會面,
These people are going after work and on weekends,
共同研究空開數據,
and they're attending these meetups to look at open data
幫助我們的城市變得更好,
and make our city a better place.
BetaNYC這樣的團體,上週剛剛發佈了 citygram.nyc
Groups like BetaNYC, who just last week released something called citygram.nyc
讓你能夠訂閱311個
that allows you to subscribe to 311 complaints
自己住家或辦公地周圍的投訴。
around your own home, or around your office.
你輸入地址,就能看到附近的投訴。
You put in your address, you get local complaints.
而且,做這些事情的並不限於技術社團。
And it's not just the tech community that are after these things.
我在Pratt學院教的城市規劃學生 也在做同樣的事。
It's urban planners like the students I teach at Pratt.
還有政策提倡者、以至每個人,
It's policy advocates, it's everyone,
是擁有不同領域背景的市民們。
it's citizens from a diverse set of backgrounds.
隨著一個個小的改變,
And with some small, incremental changes,
我們能解開市民們激情和能力的封印,
we can unlock the passion and the ability of our citizens
好好利用空開數據,建設更好的城市,
to harness open data and make our city even better,
就算每次只有一個數據庫,或只是一個停車位。
whether it's one dataset, or one parking spot at a time.
謝謝。
Thank you.
(掌聲)
(Applause)