Subtitles section Play video
("The hardest part of learning something new is not embracing new ideas, but letting go of old ones." - Todd Rose, "The End of Average")
(「學習新知最難的不是理解新概念,而是前學後忘。」──托德·羅斯,《終結平庸 》。)
The first standardized tests that we know of were administered in China over 2,000 years ago during the Han dynasty.
我們所知的第一個標準化測驗是超過兩千年前漢朝所建立的考試系統 。
Chinese officals used them to determine aptitude for various government posts.
中國官員會用那些考試來決定每個政府官階的職能。
The subject matter included philosophy, farming, and even military tactics.
考試科目包括哲學、農業,甚至軍事戰術。
Standardized tests continued to be used around the world for the next two millennia,
兩千年以後,這個世界仍沿用了標準化測驗。
and today, they're used for everything from evaluating stair climbs for firefighters in France
而今日,標準化測驗被運用在每個地方,從法國評估消防人員爬樓梯的速度、
to language examinations for diplomats in Canada to students in schools.
到加拿大外交官的語言測驗,再到學校裡的學生。
Some standardized tests measure scores only in relation to the results of other test-takers.
有些標準化測驗的評分方式是單純透過與其他考生比較分數。
Others measure performances on how well test-takers meet predetermined criteria.
其它的評分方式則是看考生是否符合既定的標竿。
So, the stair climb for the firefighter could be measured by comparing the time of the climb to that of all other firefighters.
所以說,要鑑定消防人員爬樓梯的能力,可以比較所有的消防人員爬樓梯的速度。
This might be expressed in what many call a "bell curve".
這樣的評分方式被稱之為鐘形曲線(標準常態分佈)。
Or it could be evaluated with reference to set criteria, such as carrying a certain amount of weight a certain distance up a certain number of stairs.
或者可以參考既定標準來評估,像是揹著特定重量的東西爬上指定的樓梯數量。
Similarly, the diplomat might be measured against other test-taking diplomats or against a set of fixed criteria, which demonstrate different levels of language proficiency.
同樣地,外交官可能會和其他受試的外交官互相比較,或是用一套既定標準來評估語言的不同程度。
And all of these results can be expressed using something called a "percentile".
而得到的所有結果都可以用所謂的「百分比」來表示。
If a diplomat is in the 70th percentile, 70% of test-takers scored below her.
如果一位外交官是排名在 70 百分比,代表有百分之 70 的受試者低於她的分數。
If she scored in the 30th percentile, 70% of test-takers scored above her.
如果她是排名在 30 百分比,代表有百分之 70 的受試者高於她的分數。
Although standardized tests are sometimes controversial, they're simply a tool.
雖然標準化測驗有時有爭議,但它們其實就只是個工具。
As a thought experiment, think of a standardized test as a ruler.
像一個思想實驗一樣,可以把標準化測驗想像成一把直尺。
A ruler's usefulness depends on two things.
一把尺主要有兩種功能。
First, the job we ask it to do.
第一、我們拿尺可以做的事。
Our ruler can't measure the temperature outside or how loud someone is singing.
直尺不能用來量室外溫度或是某人唱歌有多大聲。
Second, the ruler's usefulness depends on its design.
第二、直尺的用處是依據它的設計。
Say you need to measure the circumference of an orange.
假設你要用它來量一顆橘子的圓周。
Our ruler measures length, which is the right quantity, but it hasn't been designed with the flexibility required for the task at hand.
我們的直尺量的就是長度,也是正確的單位,但它還沒被設計來滿足不同手頭任務所需的靈活性。
So, if standardized tests are given the wrong job or aren't designed properly, they may end up measuring the wrong things.
所以,如果將標準化測驗使用在不適合的地方或是它們的設計不當,可能導致錯誤的結果。
In the case of schools, students with test anxiety may have trouble performing their best on a standardized test,
以學校的例子來說,對考試焦慮的學生而言,他們無法在標準化考試中表現出最好的一面,
not because they don't know the answers, but because they're feeling too nervous to share what they've learned.
並不是因為他們不知道答案,就只是因為他們太緊張,導致無法展現出他們所學到的東西。
Students with reading challenges may struggle with the wording of a math problem,
有閱讀障礙的學生可能對數學問題並不熟練,
so their test results may better reflect their literacy rather than numeracy skills.
所以比起數學能力,其考試結果更可能反應了他們的閱讀能力。
And students who are confused by examples on tests that contain unfamiliar cultural references may do poorly,
而被不熟悉的文化背景給困惑住的考生,也會因此表現不佳,
telling us more about the test-taker's cultural familiarity than their academic learning.
這樣的結果告訴我們受試者對文化的熟悉度,而非他們的學習能力。
In these cases, the tests may need to be designed differently.
在這些例子中,測驗方式可能需要有不同的設計。
Standardized tests can also have a hard time measuring abstract characteristics or skills, such as creativity, critical thinking, and collaboration.
標準化測驗也很難測出抽象化的特質或是技能,像是創造力、批判性思考、以及合作能力。
If we design a test poorly or ask it to do the wrong job or a job it's not very good at, the results may not be reliable or valid.
如果我們沒設計好考試形式,或是將它用在不對、不適合的地方,這樣得到的結果可能無法使人信服,也失去有效性 。
Reliability and validity are two critical ideas for understanding standardized tests.
可靠性和有效性在標準化測驗中是很重要的兩個主軸。
To understand the difference between them, we can use the metaphor of two broken thermometers.
要知道它們之間有什麼不同,可以用兩支壞掉的溫度計作為例子。
An unreliable thermometer gives you a different reading each time you take your temperature,
一支不可靠的溫度計在每次測量溫度時,都會顯示不同的溫度,
and the reliable but invalid thermometer is consistently ten degrees too hot.
而可靠卻無效的溫度計卻每次都比實際的溫度遠遠地熱了 10 度。
Validity also depends on accurate interpretations of results.
有效性也需仰賴對於結果的準確闡述。
If people say the results of a test mean something they don't, that test may have validity problem.
如果人們口中所解釋出的結果和測驗所顯示的不同,那考試可能會有有效性的問題。
Just as we wouldn't expect a ruler to tell us how much an elephant weighs, or what it had for breakfast,
就像我們不會期望一支直尺告訴我們一隻大象有多重,或是早餐牠吃了些什麼,
we can't expect standardized tests alone to reliably tell us how smart someone is,
我們不可以期待單憑標準化測驗就讓我們知道一個人有多聰明,
how diplomats will handle a tough situation,
或是外交官們會如何面對一個棘手的情況,
or how brave a firefighter might turn out to be.
又或是一位消防員可能會變得多勇猛。
So, standardized tests may help us learn a little about a lot of people in a short time,
所以,標準化測驗可能可以幫助我們在短時間初步地了解到很多人,
but they usually can't tell us a lot about a single person.
但通常無法讓我們深入地了解到一個人。
Many social scientists worry about test scores resulting in sweeping and often negative changes for test-takers, sometimes with long-term life consequences.
很多社會科學家擔心這樣的考試分數會讓受試者感到失落或產生消極感,有時還會影響到往後的生活。
We can't blame the tests, though.
不過我們不能怪罪這樣的測驗方式。
It's up to us to use the right tests for the right jobs and to interpret results appropriately.
在對的地方用對的考試,並適當地解釋其結果,這些都是出於我們自己的選擇。
If you'd like to learn more about this topic, we highly recommend a best-selling book called "The End of Average" by Harvard Professor Todd Rose.
如果你們想知道更多這類的主題,我們極力推薦一本由哈佛教授托德·羅斯所撰寫的熱賣書──「終結平庸」 。
In it, Rose investigates the rampant misuse of standardized test with clarity and urgency.
在這本書中,羅斯非常清晰且緊迫地探討了標準化測驗濫用的問題。
He also proposes a solution to the problem.
他也對這個問題提出一個解決方式。
You can download an audio version of this book for free on audible.com/teded.
你們可以從audible.com/teded下載免費有聲書。
And every free trial encourages Audible to continue supporting Ted-Ed's nonprofit mission.
每一次有聲書的免費試用都持續支持著 Ted-Ed 的非營利計畫
We're very passionate about this issue, and we're very grateful to any Ted-Ed community members who take the time to read or listen to this important book.
我們對這個議題非常有興趣,同時我們也很感謝Ted-Ed社團中的每一個成員花時間閱讀或是聆聽這本重要的書 。
Thanks for watching, and thanks for your support.
謝謝你們的收看,也謝謝你們的支持。