Rare Book Monthly

Articles - July - 2025 Issue

Vast Amounts of New Data from Books Being Made Available to AI Chatbox Programs like ChatGPT

A large source of additional information for AI (artificial intelligence) chatbox programs, like ChatGPT or Microsoft's Llama, has been opened. Those are the online search programs that answer just about every question you ask in seconds. A type of software known as “Large Language Models” are able to take vast amounts of data, use it to familiarize itself with manners of speech so as to understand this vast database of information, and then pull out what it needs to answer your question. It is utterly amazing what they do, but they can't do it all by themselves. They know nothing but what they are fed, and if they are to respond from the knowledge of vast amounts of information, that information must come from somewhere.

 

Much of it comes from the internet, which means they must be enough smart to separate the wheat from the chaff, and “chaff” is an overly polite word for a lot of what is out there. In other words, they also need some more reputable sources of information, and books and other publications are an important source for that. However, many (but not all) of the authors and publishers are not pleased with their work being used without payment. Authors, deservedly, get royalties for their work in books, but not for their work when it is copied and used by AI. They have sued to stop this practice and cite copyright law, as these works are copyrighted.

 

All of this is in the courts and how it is resolved is as yet unknown. However, a new source has emerged lately. That is from books in libraries. Harvard University announced that they are making their vast dataset of books from their library available to AI models at no cost. Most of this was created almost two decades ago as part of the Google Books project, where Google scanned and digitized millions of books at various libraries. Harvard compiled this and more as part of their Institutional Data Initiative at the Harvard Law Library. Harvard has files for 386 million pages from almost one million books. They are now making it available for services like ChatGPT to learn from and find answers to your questions.

 

This will be helpful, particularly for understanding historic material, but there is one very major drawback. It is safe to use these books without risk of being sued because they are out of copyright. Copyright terms are 95 years. Therefore, none of these books is less than 95 years old. This will not be much good for providing medical advice, even if it sometimes feels like this must be where RFK Jr. gets his medical recommendations. You want the latest opinions for medical diagnoses and the same for other scientific knowledge. Good luck fixing your computer or car with advice that predates 1930, unless you have a Model T. Of course, these programs already have a lot of later information in place (some of which they are being sued to remove). It just means that these 386 million new pages won't add much to answers you seek for these sorts of questions.

 

It should be noted that some information Harvard is providing is more recent since it is not subject to copyright. One example is legal case law. These court opinions are available to anyone to read – they need to be for legal experts to understand the law. This recent case law is being provided to the AI models that want to add it.

 

 

Update: A few days ago, the first court decision came down in a case of authors suing chatbox for copyright violation. The authors lost. Click here for more.


Posted On: 2025-07-09 14:41
User Name: hjrobin

No links in this discussion to the actual data. How un-bibliographic!


Rare Book Monthly

  • Heritage Auctions
    Rare Books Signature Auction
    December 15, 2025
    Heritage, Dec. 15: John Donne. Poems, By J. D. With Elegies on the Author's Death. London: M[iles]. F[lesher]. for John Marriot, 1633.
    Heritage, Dec. 15: Edgar Rice Burroughs. Tarzan of the Apes.
    Heritage, Dec. 15: F. Scott Fitzgerald. Tender is the Night. A Romance.
    Heritage, Dec. 15: Bram Stoker. Dracula. Westminster: Archibald Constable & Co., 1897.
    Heritage, Dec. 15: Jerry Thomas. How to Mix Drinks, or the Bon-Vivant's Companion, Containing Clear and Reliable Directions for Mixing All the Beverages Used in the United States…
  • Rare Book Hub is now mobile-friendly!
  • Bonhams, Dec. 8-18: Autograph Letter Signed ("Martinus Luther") to His Friend the Theologian Gerhard Wiskamp ("Gerardo Xantho Lampadario"). $100,000 - $150,000.
    Bonhams, Dec. 8-18: An Exceptionally Fine Copy of Austenís Emma: A Novel in Three Volumes. $40,000 - $60,000.
    Bonhams, Dec. 8-18: Presentation Copy of Ernest Hemmingwayís A Farewell to Arms for Edward Titus of the Black Mankin Press. $30,000 - $50,000.
    Bonhams, Dec. 8-18: Autograph Manuscript Signed Integrally for "The Songs of Pooh," by Alan Alexander. $30,000 - $50,000.
    Bonhams, Dec. 8-18: Autograph Manuscript of "Three Fragments from Gˆtterd‰mmerung" by Richard Wagner. $30,000 - $50,000.
    Bonhams, Dec. 8-18: Original Preliminary Artwork, for the First Edition of Snow Crash. $20,000 - $30,000.
    Bonhams, Dec. 8-18: Autograph Letter Signed ("T.R. Malthus") to Economist Nassau Senior on Wealth, Labor and Adam Smith. $20,000 - $30,000.
    Bonhams, Dec. 8-18: History of the Peloponnesian War by Thucydides Finely Bound by Michael Wilcox. $20,000 - $30,000.
    Bonhams, Dec. 8-18: First Edition of Lewis and Clark: Travels to the Source of the Missouri River and Across the American Continent to the Pacific Ocean. $8,000 - $12,000.
    Bonhams, Dec. 8-18: Original Artwork for the First Edition of Neal Stephenson's Groundbreaking Novel Snow Crash. $100,000 - $150,000.
    Bonhams, Dec. 8-18: A Complete Set Signed Deluxe Editions of King's The Dark Tower Series by Stephen King. $8,000 - $12,000.
    Bonhams, Dec. 8-18: Autograph Letter Signed ("John Adams") to James Le Ray de Chaumont During the Crucial Years of the Revolutionary War. $8,000 - $12,000.
  • Sotheby’s
    Book Week
    December 9-17, 2025
    Sotheby’s, Dec. 17: Francesco Colonna. Hypnerotomachie, Paris, 1546, Parisian calf by Wotton Binder C for Marcus Fugger. €200,000 to €300,000.
    Sotheby’s, Dec. 17: Nausea. De principiis dialectices Gorgias, and other works, Venice, 1523, morocco gilt for Cardinal Campeggio. €3,000 to €4,000.
    Sotheby’s, Dec. 17: Billon. Le fort inexpugnable de l'honneur, Paris, 1555, Parisian calf gilt for Peter Ernst, Graf von Mansfeld. €120,000 to €180,000.
    Sotheby’s
    Book Week
    December 9-17, 2025
    Sotheby’s, Dec. 16: Salinger, J.D. The Graham Family archive, including autographed letters, an inscribed Catcher, a rare studio photograph of the author, and more. $120,000 to $180,000.
    Sotheby’s, Dec. 16: [Austen, Jane]. A handsome first edition of Sense and Sensibility, the author's first novel. $60,000 to $80,000.
    Sotheby’s, Dec. 16: Massachusetts General Court. A powerful precursor to the Declaration of Independence: "every Act of Government … without the Consent of the People, is … Tyranny." $40,000 to $60,000.

Article Search

Archived Articles