Subtitles section Play video Print subtitles [ Noise ] [ Silence ] >> Welcome and thanks for coming this afternoon. I'm Dan Rockmore, Chair of the Department of Mathematics here at Dartmouth and also Director of the William H. Neukom Institute for Computational Science. On behalf of the college, the institute, and the friends of Dartmouth library, it's my pleasure to be able to introduce Professor Robert Darnton today of Harvard University who will be speaking to us on the Digital Public Library of America and the Digital Future. This is the third lecture on our leading voices in higher education series and as moreover, the Inaugural Donoho Colloquium. The Donoho Colloquia will be an ongoing series of public lectures aimed in increasing awareness of the many important and sometimes surprising places in which computational ideas appear. This is a central piece of the larger mission of the Neukom Institute whose aim is to support and integrate computational thinking and computational ideas throughout Dartmouth. These lectures are made possible by a generous gift from David, Miriam, and Dan Donoho in honor of Dan's graduation as a member of the class of 2006 and where Dan's brainchild in fact to honor that graduation. Dan is at present in the Emergency Room working through his anesthesiology rotation, don't be worried it's not in the Emergency Room. But we are fortunate to have David and Miriam who moved heaven and earth to get here and-- so thank you very much for coming in for your gift. [ Applause ] When we looked to initiate the Donoho Colloquium, I immediately thought of Robert Darnton as the first lecturer. He's a leading authority on the French enlightenment in the history of the book but it was his many cogently argued and beautifully written New York Review essays on the creation of a Digital Public Library of America that made him a natural choice as the first Donoho lecturer. His arguments mix historical anecdote with close legal and moral reasoning and are masterful displays of passion and advocacy and careful analysis. He makes clear the challenges and possibilities inherent in such an endeavor as well as the central role the computational and digital technology play in the story. Now, the idea of such a public resource also has an important Dartmouth connection as one of the first public calls for a national computer-based library can be found in a lecture by former math professor and 13th president of Dartmouth, John Kemeny. Given just over 50 years ago at a conference convened to mark the 100th anniversary of the founding of MIT. In his lecture, a library for 2000 A.D., Kemeny advocated for a national research library. A central resource for the nation's research community. Kemeny argued that the sheer projected volume of textual resources and the attendant problems of information search and retrieval would require digitized storage and access. Now, I can't help myself from showing you a table from Kemeny's talk that he used to illustrate the kinds of problems he anticipated. So this is what he viewed as the big problem in search [laughter]. So, what you saw is 2 hours and 27 minutes and 45 seconds to find a book so the walk to the library, finding the card and the catalogue, up the stairs, discovering the book is missing [laughter] would-- the majority of time spent waiting for Professor S to return from lunch [laughter] but I have to say, I love this for so many different reasons but I also know it's true because Professor S was my dear friend, Laurie Snell. So I know that is [laughter] this is a true story. For those of you who know Laurie, this is perfectly believable. So Kemeny's lecture is simultaneously present and of it's time as he gives us a detailed vision for an electronic library but one deeply rooted and tape drives and phone connections. Great advances in technology as well as computer science, mathematics, and statistics have made possible the much more ambitious goal that is a Digital Public Library of America. From conception to execution, Professor Darnton has led the charge for its creation. Robert Darnton is Carl H. Pforzheimer University Professor at Harvard and Director of the Harvard University Library, he is a Harvard graduate, a Rhodes Scholar, a former reporter for the New York Times, and was a professor on the Princeton History Faculty from 1968 until 2007 when he returned to Harvard. He has held numerous visiting positions and is a member of the boards of many prestigious institutions including the New York Public Library. He is the author of many scholarly essays and books including the Forbidden Bestsellers of Pre-Revolutionary France which was a National Book Critics Circle Award winner. Professor Darnton is the recipient of numerous honors and prizes including a MacArthur Fellowship and most recently, a National Humanities Medal received just a few weeks ago from President Obama. In the words of the citation, Professor Darnton has a quote, "Determination to make knowledge accessible to everyone." As an author, he has illuminated the world of Enlightenment and Revolutionary France, and as a librarian, he has endeavored to make his vision for a comprehensive national library of digitized books a reality, end quote. We look forward to his sharing of that vision with us this afternoon so please join me in welcoming our first Donoho lecturer, Professor Robert Darnton. [ Applause ] >> Thank you Dan. Well, thank you I'm delighted to be here. It's good to see snow, it's the first snow I've seen this winter practically. But I'm especially honored to be giving the first of the Donoho lectures and I'm delighted that you could come yourselves. I think that Neukom Institute is a good thing and I think probably, a lot of you care about books so that makes me feel good. They can be digitized, they can be printed on paper but they are actually doing rather well, even the old fashioned printed codex. Believe it or not, this year, more books will be published worldwide than ever before. 1 million new titles almost all of them in print. It's amazing. So when people tell you the book is dead, just shake your head in disbelief, the book is not dead. It makes me think often of one of my favorite graffiti and it's actually in the men's room of Firestone Library in Princeton, you know, and you may have seen one like this. It begins, "God is dead," signed Nietzsche [laughter] and then underneath Nietzsche is, "Dead," signed God [laughter]. The book is absolutely not dead. And I think there are a lot of misconceptions actually about the digital and the analog as if they were at war with one another, you know, as if they occupied opposite and inimical positions on some kind of technological spectrum. One thing we've learn from the history of books is that one medium does not displace another. Believe it or not, after the invention or reinvention of movable type by Gutenberg, manuscript publishing increased and it continued to thrive for 3 centuries after Gutenberg. It was often cheaper to hire scribes to copy out a whole book for an addition of less than 100 copies. So people are publishing manuscript books well into 18th centuries, some even in the 19th century. And I think today, we all understand that the radio did not kill the newspaper, and TV didn't kill the radio, and the internet didn't kill TV, we live in a, I think, an environment of media that gets richer and more complicated but it's certainly not one in which it's just zero-sum games and the printed book is gone. That does not mean, however, that all is well in the world of printed books, I mean there are a lot of very unhappy publishers, authors, booksellers, and even at the occasional librarian, I think, Jeff. There is pressure all over the place and that's really the subject of my talk. So I'd like to begin if I may by quoting Thomas Jefferson, the devil can quote Thomas Jefferson but I like to do so anyhow and I've done this in other settings because of his famous remark in a letter that he wrote in 1813 developing a metaphor about light. So you should think of the enlightenment, light in the form of a candle which he'd call the taper. So I'll give you the full quote and I hope we will all feel enlightened and then I will try to take it from there. "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it. Its peculiar character, too, is that no one possesses the less, because every other possesses the whole of it. He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me." Now, you might think that the 18th century ideal of spreading light, enlightenment, sounds archaic, in fact it may sound suspiciously professorial. We professors like to invoke Thomas Jefferson and I especially like to invoke people like Condor [inaudible] who was all for spreading light and who is convinced that there would be indefinite progress, thanks to the publication of books. But that can sound naive and the point of course is what then could have been or actually was merely Utopian is now possible thanks to modern technology, the internet. Still, having said that and probably a lot of people would agree on the face of it, it can nonetheless sound Utopian so I would like to invoke another kind of American sprit that can do pragmatic, no-nonsense, business plan type of spirit in order to argue my case. The point is that you can invoke even economists to develop this sort of an argument. After all, one of the most hard-boiled concepts of modern economics is that of a public good. Public goods such as clean air, efficient roads, hygienic sewage disposal, and adequate schooling, benefit the entire citizenry and when citizens benefit, does not diminish that of another. Public goods are not assets in a zero-sum game. But they do carry costs, upfront costs usually paid for by taxation and this occurs at the production end of services and facilities that the public enjoys. So the Jeffersonian ideal of access to knowledge as a public good does not mean that knowledge is costless. We enjoy freedom of information, but of course, information is not free. Someone had to pay for Jefferson's taper. Now, I would like to emphasize that point because few people have any idea of what it actually costs to provide them with the information that they consult every day on the internet. Instead, they complain about information overload. My daughter, for example, laments the fact that, as she puts it, "The amount of medical knowledge doubles every 2 years." And yet she knows nothing about another tendency that undercuts that doubling, namely commercialization. According to several reliable sources, the amount of research published in medical journals actually does almost double over 2-year periods. The US library of medicine reports that the number of medical journals increased from 3,472 in the year 2000 to 4,866 in the year 2010. And the number-- excuse-- the statistics, but can you imagine having to read this many journals if you're a doctor and you've got a patient with some distressing symptoms that you can't quite figure out. Well, you'd go to the internet of course and you have to find the right article. Well, citations to the articles in these journals increased from 10.7 million in 2000 to 18.3 million in 2010. How could anyone find all pertinent information even with a powerful search engine in this ocean of publications? I don't know, but of course, doctors keep trying. There was an average of 3.5 million searches a day in 2009 in just in medical journals. What the doctors fail to understand is that their searches take place in fenced-off territory which belongs to the publishers of the medical journals. The publishers charge exorbitant prices for access to their terrain. And their enclosure movement increases while cyberspace expands. So, yes, more knowledge is being constantly produced and an increasingly small percentage of it is accessible to the public. Now I'd like to discuss this tendency in relation to the cost of journals and books and then to suggest how it could be inversed by treating knowledge as a public good provided through the internet. If I could come back to the example of my doctor, I should explain that he works in a teaching hospital attached to the Harvard Medical School, which means that the Harvard library gets to pay for all of these journals. Through his computer and his smart phone, he has access to all of the journals that the medical school buys for him. And that is almost entirely 99.9 percent in the form of electronic journals whose total cost for Harvard just for medical journals is 2.5 million dollars a year. The journals include the Journal of Comparative Neurology; this priced 29 thousand dollars for a year's subscription. Brain Research, 20 thousand-- 23 thousand dollars a year; Biochemica, 20 thousand dollars a year; and I could go on and on. The cost of academic journals in general has increased at 4 times the rate of inflation since 1980. Everything indicates that it will continue to increase at along the same trajectory, maybe it'll level off a little bit but not much. The prices for the increase in journals in general, scholarly journals, is estimated to vary the increase between 4 percent and 9 percent. And this is after catastrophic ratcheting up of the cost of journals. So health maybe a public good but information about health is monopolized by publishers who extract as much profit as the market will bear. Now this is not news to librarians and you'll find librarian can tell you a lots more about it. They have had to make room in their budgets for the hyper inflation of journal prices year after year for at least 3 decades. But this news is not understood by many academics. They actually perpetrate a kind of irrationality at the heart of the system because of course we academics do the research. We write the articles, we serve as referees for articles written by others. We also serve on the editorial boards of the journals often as editors. And then we buy back the result of our labor which is all done for free at outrageous prices. But of course, we don't pay for it, our library does. And very few academics understand how this can dent a library's budget. You know, there used to be a rule of thumb that libraries would spend roughly 50 percent of their acquisition's budgets on periodicals and 50 percent on monographs. Well, those percentages have changed and now many libraries spend 60, 75 percent some 90 percent of their acquisition's budgets just on serials. So that is-- means that they're not buying monographs anymore. And if they don't buy monographs, think of what effect that has on university presses in subjects like the social sciences and humanities. They have to cut back on it because they depend to a considerable extent on sales to libraries. And if they cut back on the production of monographs, what's going to happen to these new PhD students who must publish or perish? There's a kind of vicious circle at work throughout this whole system and the system just looked at, in those terms, seem to me, extremely irrational. Well the publishers would have an answer to this. They would say that, first of all, there's a kind of naive idealism behind the Jeffersonian Principle. We live in a real world of, well capitalism. And it's true that not only did Jefferson discount the cost of his taper, he had a-- not a very successful business plan when it came to trying to run Monticello. You may know that he really went bankrupt and there had to be a collection to keep him from going broke. Of course, it can be expensive to publish a journal. I'm not denying that at all. And look at what a good journal does. There are referees to be organized. It can be a big job. There are-- there's editing to be done, there are pages to be designed. The journal has to be marketed; the money has to be collected and redistributed. There is a lot of what publisher's call "Added value." And I'm not trying to minimize that in the slightest. So yes, journal publishers deserve a fair return on their investment, but what is fair. Last year, Elsevier's profit margin was 36 percent on an income of 2 billion pounds. Other publishers often report profits of 20 to 40 percent. In its analysis of their practices, Deutsche Bank concluded, "If the process really were as complex, costly, and value-added as the publishers protest that it is, 40 percent margins would not be available." Now, publishers could answer by invoking the famous market place of ideas. They could return the Jeffersonian argument against itself by asserting that in a free market of ideas, the best will triumph. Whether embodied in articles or books or any other format, the best will sell and sell at a fair price determined by demand. Unfortunately however, demand is not flexible in the world of scholarly periodicals. Publishers create journals in certain highly specialized sectors where they can have the territory all to themselves. Once they staked out their turf, hired a prestigious board of editors because prestige is crucial in this game, and begun to accumulate a following among readers, they can keep competitors out. In fact, competition rarely exists in the esoteric sectors of science. And the big 3 publishers, Elsevier, Wiley-Blackwell, and Springer, published 42 percent of all journal articles. They group journals in bundles selling the newer and more obscure publications along with the more famous ones and if you, the librarian, want to unbundle the bundle, then somehow mysteriously as you read out journals you don't want so much, the price of the ones you do want increases so that it's more expensive than it ever was in bundled form. They have a hundred tricks to keep up that 40 percent profit margin. Well, I won't go on and on, but I do think that we've got to do something about this. And for one thing, we should be able to share information but may of these contracts have nondisclosure clauses. So that I can't know what Jeff pays for his bundle, except that it's probably too much. [Laughter ] Well, the market is being manipulated and monopolized and I think that private gain is eclipsing-- has eclipsed the public good. Jefferson's taper has been reduced to an ashen glimmer. How long can the price gauging continue? Well, we may be nearing a breaking point because some research libraries have simply found it impossible to pay for the continuous increase in the journal prices. They refuse to renew subscriptions and write-out complaints from their faculty members who demand, of course, an unlimited supply of knowledge. And sometimes, they, for example, provide pay-per-view, that is to say a faculty member or a student can pay just to read a particular article that may have been recommended. Now the cost for Wiley-Blackwell to read one article is now 42 dollars, to read one article. Few libraries have summoned up the courage to walk away from the table in contract negotiations when faced with unbearably expensive terms. Well, you might think, just tell them, "I'm the customer, the-- isn't the customer always right? I won't accept that increase of mind percent in this year of 2012." It doesn't work like that because if I did that in Harvard, there would be a revolt on the part of the faculty beginning perhaps in the medical school where my doctor ex-- he thinks he's just going to have an endless flow of access. So the alternative to this is not, I think, simply to negotiate harder but to develop another strategy that would reverse the economics of journal publishing. I think we should treat it as a public good in a manner analogous to the funding of the public roads. This could be paid for at the production end and made available free to users. Although the US government already subsidizes a great deal of research and also publishing through the NIH and the NSF, it probably can't do much more. I don't think we expect more money to be coming out of congress for this sort of thing. But as you know, the NIH has a huge budget. And in 2000-- I'm forgetting the year, I think it was 2008, the NIH have passed a requirement that any research based on NIH funds, that is public funds paid for by the tax payer, had to be made available to the public, to the tax payers. That was a mandate and it makes a certain amount of sense. Don't you think that public-supported research ought to be available to the public? But, there-- a bill was introduced to the House of Representatives in December to withdraw this mandate, this so called Research Works Act which is just going to wipe it out. And who is behind this bill? The lobbies. I mean the lobbies are the ones that have been-- I would say, manipulating copyright, among other things, for the advantage of private gain while neglecting the public good. So we are in a very difficult situation and I think that we have to begin to work out something that would work better and for the public good. Now, things are changing fast, no need to tell you this. We're going through a fascinating transitional period from a world that was entirely analogue to a world that will someday be overwhelmingly digital. But now, you know, things are being mixed-up together in fascinating ways that I find enriching in general but also very expensive. Clear distinctions no longer exist between text and data, articles and books, searching and researching, posting and publishing, authorship and readership, writing and mixing and mashing. The blurring of boundaries and the untethering of knowledge may make us feel uncomfortable but they belong to a transformation of the landscape of information that I think will create new room for the public good. To illustrate this point, I would like to devote the rest of my talk to one of these possibilities, the attempt to build a digital library that will make the cultural heritage of the United States available to all Americans and, in fact, to everyone in the world. Now, although fantasies about a mega, meta, macro library go back to the ancients, the possibility of actually constructing one is recent. It dates from the creation of the internet, 1974, and the web, 1991. Google demonstrated that the new technology could be harnessed to create a new kind of library. One that, at least in principle, could contain all of the information and all of the books in the world, but Google Book Search is a story of a good idea gone bad. As first conceived, it promised to do what Google did best, that is it was going to be a search service so that you could request information and Google would provide on the screen of your computer the word search surrounded by snippets. So you would get a few sentences that would tell you how this word figured in a book and often, Google, even better, provided information about the nearest library where you could get that book. I mean, I think that was terrific but that's not what happened, why? Well because Google digitized books not only-- that were not only in the public domain, they crossed over the boundary that separated public domain books from books covered by copyright. They came first to Harvard where we told them, "Public domain books, yes; copyrighted books, no." But they also came to Michigan and Stanford and the University of California which did permit them to digitize copyrighted books. So instantly, they were sued for infringement of copyright by the Authors Guild and the Association of American Publishers. And as soon as they suit was placed, secret negotiations began. These negotiations lasted almost for 3 years, and at the end, there was an announcement of something called "The Settlement." Now the settlement was just the-- almost the opposite extreme from the original Google search service. It was the creation of a well, gigantic digital library in which the libraries that had provided the books for Google to digitize would be permitted to buy back digital copies of those very same books at a prize to be determined by Google without any public oversight or any limits. And when I myself read the settlement as it was being negotiated and finally announced, it seem to me that the prize of access to this library could expand out of hand just the way the prize for periodicals had gone up. So it-- I thought, for one, that this was not a good idea. But I wasn't the only one who thought this because of course it had to be submitted to a court. And it was, as required, submitted to a court and the Southern District of the-- of New York Federal Court. It was a very interesting moment, if I may open a parenthesis, the judge in this case was a-- is a man called Denny Chin. And his story is a real American success story in many ways. He arrived at age 5 with parents from China. His father worked in a Chinese restaurant, his mother swabbed floors; they lived in the Hell's Kitchen district of New York City. And we worked hard as a young boy. Won a scholarship to Princeton, went to law school, practiced for a while, became a judge, and now he found himself the judge to decide a case which is, I think, a monumental importance for whole future of books. And you could say that Sergey Brin and Larry Page of Google also represent things spectacular American's success story. You know developing Google in a garage and all of that sort of thing. They, too, were scholarships students with bright ideas. So the two are confronted in this fascinating court case which finally was announced, I mean, the decision was announced in-- last March. What judge Chin said was, "The Google Book settlement that is a monopoly in violation of the Anti Sherman Trust Act." And he based his argument on memos that were furnished by the Department of Justice. Very persuasive memos, I think I've read them all, and even memos furnished by the Federal Republic of Germany and the French Republic. Not to mention, more that 400 people who were have, you know, sending Amicus priest to the court saying, "This is a bad idea," because it really came down to dividing a pie. Google would get 37 percent of the profits and the litigants would get 63 percent, the public? The public had no place in it whatsoever. So Google Book Search was declared illegal. And furthermore, it was a class action suit so that judge Chin had to certify that the Author's Guild really represented authors in general. And he said, "No! They don't nor does the Association of American Publishers represent all publishers." We could talk about class action suits if you like but it's a fascinating example of trying to stretch one aspect of American Law to cover something entirely new in this new digital world, and it didn't work. So my point is simply that Google chose the path of commercialization when confronted with this conflict about infringement of rights. Whatever the faith of Google Book Search might be, I think we must now take up where Google left off. And in fact, we've been doing this long before judge Chin made his decision. In October of 2010, I called together a group of leaders of foundations of libraries, computer scientists, mathematicians, for an informal conference about the possibility of creating an open access digital library. And I sent just of page and half of general description. The group came together and almost immediately said, "We can do it. We can do it technologically and we can do it financially." So the heads of all the major foundations in this country have said, "We will support this idea." So the funding is there without going to congress. I mean the hopes of getting anything out of congress now are not great. So we organized a steering committee, a secretariat with a small grant from the Sloan Foundation to cover administrative costs and other small costs. Then we created 6 working groups which took up aspects of very complicated program to make this library actually happen. And these working groups spread out throughout the country. Lots of people were recruited from many different sectors and they're hard at work at it. Now, dealing with 5 basic problems which are the scope and content of this DPLA, Digital Public Library of America, its possible costs, the legal problems it will face, its technical architecture, and its governance. So, I don't have too much time left but I'd like to discuss each of these 5 and then open the floor for questions. Scope and content, the DPLA will not draw on one gigantic database unlike Google. It will be a distributed system which will aggregate collections from many research libraries, museum, and other institutions. It will provide one quick access to documents in many formats including images, recordings, and videos. But at first, it will concentrate and consist primarily of books, books in the public domain. Google digitized about 2 million books in the public domain and copies of its digital files have been deposited in this great repository known as the HathiTrust that some of you might know about. The Internet Archive which is a not-for-profit, open access digitizing operation founded by Brewster Kahle also has accumulated well over a million digitized copies of public domain books. So this exists already and what we want to do is to bring it all together and make it available to everyone. This material is largely already accessible online so you might say, "Well okay, sounds great but what's so wonderful about making it accessible all over again?" And the answer is, this is just the beginning, this will be the preliminary version of things and it will include lots of material that is undreamt of really by Google. By that, I mean special collections. Every great research library, such as yours here at Dartmouth has fabulous special collections and you've often digitized quite of bit of them. At Harvard, we have something called the "Open Collections Program" in which we have digitize 2.3 million pages of documents related to certain specific themes such as "Woman at Work" and "Immigration" and "Voyages of Scientific Discovery." They're available on a repository we created free of charge to everyone in the world. The People's Republic of China came to us and asked to digitize 51,500 of our rare Chinese books because they're are not available in China and so we've worked out an agreement and they will be made available on the digital public library. The-- another example but there are many, many examples, concerns newspapers. Every state has digitized all of the newspapers in all of its collections. They've been aggregated at the state level and these 50 aggregated collections are in turn being aggregated by the library of congress. It's going to deposit all of them in the digital public library of America. So, already for starters, I think, we will offer a fabulous treasure trove of information to the American public. Unfortunately, copyright laws prevent the public domain from extending beyond 1923. That means that most 20th century literature exists in what librarians call a black hole. It's covered by copyright and cannot be digitized and made available without infringement of copyright. So, what will our scope be and where will we draw the line? Assuming we could get around the copyright laws, I'll discuss that in a minute, some of us argue that the DPLA should cons-- have everything right up to the present. My own argument, but it's just mine, is that no, it should stay out of the current market place for books. And that we should have what you could call a moving wall so that anything published during the last 5 years or maybe the last 10 years would not be available and we would not therefore threaten the interest of publishers and authors who are understandably trying to make money from the publication of books. How long is the shelf life of a book? I don't have an answer to that question but first of all, most books never make it onto the shelves of bookstores. Bookstores are going out of business do a considerable extent. But if the books did make it onto the shelves of a bookstore, the bookstore existed. How long would it be there? Few days? Few weeks? Then they disappear, remaindered or, you know, sent back, returns, it's the plague of publishing. So, I think actually that it would be in the interest of many authors who once the economic of demand for their book have disappeared to make those books available for maybe a small free or indeed, free of charge. Authors want readers and I'm sure many of you here are authors and-- okay, academics don't hit the jackpot very often but you might make of nothing royalties to take your husband or wife out for supper once a year [laughter]. Anyhow, that's my case generally. So, I think that we could, if possible, have a fabulous library that could include virtually everything but not invade the commercial market. Second point concerns costs. Now, as I said, the DPLA will almost certainly be a distributed system which will aggregate collections that already exist in dozens of research libraries. When it opens, it will probably contain only these basic stock which I've just describe, but from that point onward, it will grow as fast as its budget permits. So what should its budget be? Well, of course a lot of money will go into the technological infrastructure and then the administration, although we hope it will not be heavily administered, we don't want a lot of management in it. But we can take the example of Europeana, I don't know how many of you know Europeana. It's an aggregator of collections in Europe. So it's actually located in the Netherlands and it aggregates already aggregated collections in 27 European countries and it's not yet gone online. It tried it once a few years ago and crashed because there was so much demand. But it will be going online again soon and we are coordinating the design of our DPLA so that it will be interoperable with that of Europeana. In other words, we're working towards a worldwide system of distribution. Europeana's budget is only 5 million euros a year, a very modest budget. But of course, it doesn't digitize itself, it doesn't under take preservation, it doesn't do a lot of things that we want to do for-- at the DPLA. What would it cost if the DPLA led a major effort to digitize books that are covered by copyright but are out of print or commercially unavailable as Google calls it. Well Brewster Kahle who's digitized more than a million books for his internet archive says, "I can digitize a book for 10 cents a page," and if you take a book of about 300 pages, that comes to 30 dollars, not really very expensive. Others think that's not really realistic although Brewster has a lot of experience of digitizing. They say, "Well, a dollar of page is more I like it," there's a big debate as to what the costs are but they're going down all the time thanks to technological improvements. So, it's true that we can-- we must not only digitize but we have other functions to fulfill such as, well, perfecting metadata, that is descriptions of how you can locate the book. We must do something about preservation. It's fine to digitize but you have a responsibility to preserve the book and we estimate that preservation will be something like 20 percent of the digital or digitizing costs. And there are other possible services such as curation and the development of apps of all sorts. In fact, we will have a pilot project that we call a "Scanabago" something like a Winnebago that will go out to small towns in Massachusetts as a pilot and just offer to scan tiny little special collections in public libraries and then to help that library develop its own collections too. So, we see quite an important grass roots element to all of these. So, by combining ballpark or if you like, back of the envelope estimates, I would think that we could digitize a million books a year or an annual budget of 75 to 100 million dollars. The budget of the Library of Congress, by the way, in 2010 came to 684 million dollars. So, if a grand coalition of foundation contributed, say 100 million a year, a great library would exist within a decade. Double that rate and the library would soon be the greatest that ever existed. But we don't need to rush, we must do the job right and unfortunately, Google and much of its digitizing didn't do the job right. You've probably seen books in which a hand appears covering up the page because the scanner forgot to remove his or her hand. And, then there's a metadata of Google which is famous because, you know, they don't talk about books, they just talk about information or data points. And, so they cata-- they catalogued Walt Whitman's Leaves of Grass under "Gardening." So, we can do better than that and we are trying to design a library that will last for centuries. But it could grow gradually on a budget of let's say only 10 million dollars a year. Third point has to do with legal issues. Dan, am I going over the time? I should-- I can hurry up-- >> No, actually it's fine. [Simultaneous Talking] >> Am I-- okay, so I don't want to keep you too long and you may have questions, I'm almost finished. But the legal issues, I really see this as the most important problem of all. Of course, the DPLA must and will respect copyright. How far can it go in making accessible books that are out of print but covered by copyright? Well, that depends on the possibility of modifying the copyright laws by legislation or perhaps on other strategies. Now, the history of copyright in the United States goes back to article 1 section 8 clause 8 of The Constitution which sets 2 objectives, I quote, " To promote a progress of science and useful arts, for securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries." The first copyright law passed in 1790 struck a balance, I think, between those 2 objectives, how? By giving authors the exclusive right to the income from their books for 14 years renewable once. And that provision in 1790 actually was-- took up the model provided by Britain. In that first copyright act in existence, the Statute of Anne in 1710, exactly the same objectives are announced, and a balance was struck between the welfare of the public on the one hand and that of the booksellers and authors on the other and this deal was the same, 14 years renewable once. But the Company of Stationers, the booksellers, publishers protested and there were series of trials that go right through the 18th century. They're really quite fascinating involving people like Alexander Pope, you know, great figures in English Literature, and they were finally decided in a famous case of 1774 for Donaldson versus Becket by the House of Lords, actually, 14 years renewable once. So that's where we got our model and, you know, it wasn't so bad. The basic point was no perpetual copyright even though the best lawyers in England had argued for it. Copyright should not be perpetual. Now, in the debate over the re-- so called extension of the Copyright Act of 1998, in the American congress, the key actor was Jack Valenti, the lobbyist for Hollywood, basically. And Valenti was asked, "Mr. Valenti, do you believe in perpetual copyright?" And he said, "No. Certainly not, I think copyright should be forever minus one day." [Laughter] So that's what we're up against and you could say the Jefferson's taper has almost died out. The current limit of copyright, the life of the author plus 70 years or 95 years, in the case of corporate creations like Mickey Mouse, it's known as the said Mickey Mouse Copyright Renewal act of 1998. This is in practice more than a century for every book. And so we're keeping the vast bulk of our literature out of the public domain where I think the bulk of it belongs. So what can we do about this? Well, it's a long and complicated story, you could say that further legislation would solve the problem. However, lobbyist have had such a heavy hand in attempts to pass legislation especially about orphan books, books whose copyright owners can't be identified, that-- it's a rather discouraging story. There attempts in 2006 and 2008 to pass orphan book legislation and people I know who followed this closely said the lobbyist massacre, especially the 2008 bill which was never passed so badly that it would have been worse than having no bill at all. So it's difficult to summon up much confidence about help from congress, above a lot of things, not just copyright. [Laughter] What about fair use? Well, in the Copyright Act of 1976, there's a thing as sector 107 and 108 which have been gone over endlessly by lawyers and others because that's were the provision is made for fair use. And you use that of course today in your library when you allow copyrighted articles to be made available in classes for example. Can we expand this Fair Use Act in such a way that it would hold up in court for the public and not for profit institution devoted to the public good? I think that would be wonderful if we could do it but my lawyer friends say very dicy. And furthermore, if we-- once we get the DPLA up and running, would we want to take the risk of so many suits especially when damages begin at 100,000 dollars? So I think we probably won't follow that path. What else could we do? Well, there are other things and I won't go into this in too much detail 'cause I'm taking too long, but there is a fascinating provision that is working very nicely in Scandinavia called Extended Collective Licensing Agreements. And if you want, we could talk a little bit more about that. But let me come to some of the other last 2 points, first the technical architecture, I mean, I was delighted to meet some of your young computer scientists here, a very impressive group. And we are working very closely with computer scientists for the technological infrastructure of the DPLA. In fact in June, we announced what we called a "Beta Sprint" and invited computers-- or anyone, anywhere to submit suggestions, maybe an overall blueprint for the technological design of the library or particular apps or aspects of it. 60 people or groups responded instantly, and finally 40 competed. There was lot of enthusiasm in the world of computer science for this kind of a project and they were given 3 months in this so-called "Beta Sprint" to come up with a finish suggestion. A blue ribbon jury passed judgment on which ones was-- ones were the best. And last October, we held a large meeting in Washington hosted by the Library of Congress, the Smithsonian Institution, the NAH, and we announced the winners. There were actually 6 winners and we are incorporating their ideas in the first prototype which we will have developed in 2 months and then it will be submitted for further critiques and finally, it would be ready when the DPLA gets up and running in April 2013, April 2013, that's tomorrow. The race to this deadline may seem breathtaking but it's fueled by enthusiasm and energy. Leading figures in Computer Science Information Technology and Library Science have assured us that the task is doable and we will get it done. Last point concerns governance. Here, I shall be brief because I'm not-- not just I'm running out of time, but we haven't made major decisions. For example, where should the DPLA be located when it has offices? Who should lead it? To whom should it be responsible? How will it formulate policy and administer its services? The present secretariat is doing a good job but it won't continue after April 2013 because it's a Harvard operation and people love to-- I don't know if you suffer these much at Dartmouth but they love to point the finger at us and say, "Elitism," I mean, the number one cuss word when it comes to the throwing around of epithets. And it's not going to be a library for the elite; I mean I think that advanced researchers will benefit enormously from it. But we're aiming this library at ordinary people. Think of community colleges, a community college in North Dakota or Alabama which doesn't really have a library. We can make available to them a library that will be as great or greater than the Library of Congress, free of charge. K through 12 schools, retirement homes, individuals who are just curious about things and would like to find out more, scattered all around the country and all over the world, this is a sort of public we're aiming for. The public that goes to public libraries and leaders of public libraries are part of ours steering committee and are helping us design this. But we haven't reached final decisions about if you like governance. We just know that we are aiming at a very broad constituency, we might create a-- an independent new organization by taking advantage of Section 501C3 of the Internal Revenue Code and setting up a tax-exempt corporation. At present, most people involve in this effort agree that it should not be part of the Federal Government, it should be free of political pressures of any kind. It might resemble maybe the National Academy of Sciences or perhaps the BBC, in fact, however, it won't resemble anything because nothing like it has ever existed, a library without walls that will extend everywhere and contain nearly everything available in the walled-in repositories of human culture. E Pluribus Unum, Jefferson would have loved it, Thank you. [ Applause ] >> We do have time for few questions, [background noise] I'll let you orchestrate how [inaudible] do this. >> Okay, yes ma'am? >> In the publishing world, we know that it's very hard for many writers and editors to earn a living [inaudible]. So what-- for example, in Scandinavia when someone places a book out of the library, they often get some royalty [inaudible]. So what royalty [inaudible] understand or help developers feel settled on for this process. >> Right, can you hear me okay? >> Yup. >> Well, the Authors Guild is adamant about continuing the Google suit, I mean, I could go on and on about the Google suit but you see after Judge Chin declared the suit unrecievable, it reverted to the original copyright suit and I think the publishers are going to make a separate deal, but the Authors Guild is pushing this still to this day. And-- so, it's very militant about trying to protect the royalties of authors and that's understandable. Authors deserve royalties. So what will we do about it, that's the question. Well, in the case of Norway, every Norwegian has the right to read every book in Norwegian-- in Norway and the owner of the rights is paid a certain sum of money per page read. That sum of money comes from the-- kind of escrow fund that is collected by the Norwegian government and you might say to me, "They have oil nearby." [Laughter] And furthermore, there's something about life in Scandinavia because there's a similar outfit in Sweden and in Denmark and in Finland. There's something about the sense of the public good in these countries that I think is much stronger than what we have here. Still, it seems to me that we can create an escrow fund but we must have the agreement of a representative group of authors and of publishers to do this. And I'm not sure how we can get their agreement. So, we need to woo them but I actually, I mean, I was a trustee of the Oxford University Press for 15 years. It's a-- it's-- okay, a university press but it's a huge press that sells a lot of books. As some would say, it's a "Trade Press." It's both trade and university press. These-- it's devoted really to this spread of knowledge. And I think a lot of publishers care about literature that's why they went into this not very lucrative trade in the first place. So if we don't invade the current commercial market and undercut them in that way, it seems to me we ought to be able to win their support. And we can do so by giving a reasonable loyalty for the consultation of these books that's my hope. >> Yes? >> If I understood you correctly, the academic publishing, the journals and magazines, talked a little bit out of the site [inaudible]. >> Right. >> And I was on Laudenbach in Germany years ago and we had the same problem there discussing this-- particularly with digital publishing. And the idea that came up again and again there was if everything is already in place by academics on pure review to publications and support them, it's basically the academics should do the work. And the infrastructure's also the only other hand in terms of digital publishing. Why can't we do it on the [inaudible] and just bypassing the entire tremendous and scandalous cost for publishing? Would that be-- wouldn't that be also an aspect we integrated into this monogamous system of security? >> I couldn't agree more. Now, the attempts to reverse the economics of journal publishing, I mentioned just briefly in passing, but it does involve not so much the DPLA although it might someday. It really involves instead processing fees. So, the idea is to-- for universities, to pay for what sometimes called "Authors fees" to subsidize articles that will go into open access journals and often grants to scientists have a certain amount for publication as well. So there's hope for this. And at Harvard, we have a program and we subsidize up to a thousand dollars per professor. This is beginning to spread and I'm happy to say that Dartmouth is part of this attempt to reverse the economics of journal publishing. And you're right, you know, it's doable but you probably know the story of the Max Planck Institute in Germany. They tried to do it, they held out for, I think it was about 3 months, and then they collapsed in the face of Springer. So, it's not easy and I think it's going to take time but it's got to work because it's so rational compared with what we have now. And so once we tip the balance in favor of open access, I think that this will work and there will still be closed accessed journals, you know, cell and nature are not about to disappear but they don't represent the bulk of things. So I'm hopeful in that respect. >> There's a gentleman back there. >> In New England, in many places, there is a long-standing tradition of municipal libraries, the town library, what might be the effect of Universal Digital Library on the town library? >> Yeah. Well, that's a very good question and it's one that we care about passionately. We want to support and reinforce town public libraries. So, we had a debate about even using the word "public", you know, you could call it the Digital Library of America, and frankly, I prefer that as a term because it-- you know, there's a danger of being misunderstood. And so, some people might feel that if we provide all of this material free of charge that municipalities can reduce their budgets for public libraries. That's not the case. So I think what will happen especially if we have a moving wall, such as the one I described, is that public libraries will continue to do what they do so well to satisfy the demand of their-- demands of their users by making available current best-sellers, current books of all sorts, DVDs, videos, magazines, and that the Digital Public Library will provide them with a vast corpus of works that were published 10 years ago and beyond. So I think it will enrich public libraries enormously. And in fact, we have several public librarians on our steering committee and they agreed with this. So we are deeply committed to helping public libraries. >> You talked briefly about quality issues with services like Google Books in terms of reproduction quality and quality control. But you also talked about reducing cost per page per scanning. How do you see the DPLA interfacing with special collections preservation efforts, and what can determine your standard or the resolution and the quality of expense that you're introducing? >> Yeah. It's an excellent question. I may not have an adequate answer to it because first of all, a lot of these digitizing of special collections has been done on the spot. And so the quality is assured by people like your librarian and your rare book collections or wherever these works maybe located. I think it's fair to say that in general, when it comes to digitizing special collections, libraries take great care with them. Certainly at Harvard, we do-- we have a huge digitizing operation on the D-floor of Widener, it's expensive. The quality is terrific but actually it's so good that I think we could do with much worse quality to get the grade bulk of books out there and not, you know, the medieval manuscripts would be digitized correctly at high price-- at a high price. So I think for the special collections that are being digitized by libraries, the quality is not so much a problem but the DPLA probably won't have-- we might set up quality standards but we'll have no power to determine how the digitizing is done provided it meets certain standards. So I don't think that's a real a problem but it's a good point because we want to lower the costs of digitizing. Now, I-- maybe some computer scientist can correct me, but the information I have from our large section of IT in the Harvard library is the costs of preservation, for example, are going down tremendously year after a year. Some say by 50 percent each year. And the costs of scanning, scanners are now quite inexpensive. So I think the technology is working in our favor and that this is not going to be a major problem. >> Maybe just one more question. >> Mr. Darnton-- [ Inaudible Remark ] >> Yes. >> Can you tell us more about it? >> Yes, that's called an espresso book machine. [Laughter] And the idea is you print a book in about the time it takes to get an espresso coffee. Now, we have one actually in the Harvard bookstore across the street from Widener Library. So you, the user, go into the book shop and there's a computer there and you order a title. The order goes to a digital database. The text is returned and downloaded on a machine in a matter of seconds. The machine is a wonderful glass-enclosed printing machine. >> I saw that. >> And you saw that it worked. It can print the text, trim the pages, attach a paperback cover in less than 4 minutes. And it can do so-- the prices vary because the publishers set the prices. But the prices are often 8 dollars for a paperback. That means that you can, through Print on Demand, have access to a whole world of literature if you happen to like to read printed books instead of to read on reading devices. And that's an example of what I meant when I said think the analogue and the digital are at war with one another because here we're using great digital electronic technology to reinforce the printed book. And I've-- they've printed several of my own books, I think that the Print on Demand copy is every bit as good as the original. So, that's-- we're doing lots of wonderful things right now and I-- >> [Inaudible] through Oxford be available to that one? >> That depends on how [inaudible]-- >> At Cambridge. >> And Cambridge, sure-- [Inaudible Remark] Yeah, especially at Cambridge. >> And-- [Laughter] >> I was a student at Oxford but never mind. >> Thank you very much. >> All right. >> No, but there is something called "Oxford Scholarship" online which is an attempt to bring the back list of Oxford within the paying power of a large public. >> All right Bob, thanks so much. [ Applause ]
B1 library public copyright digital harvard publishing The Digital Public Library of America and the Digital Future 114 12 Hhart Budha posted on 2014/06/13 More Share Save Report Video vocabulary