Vast Amounts of New Data from Books Being Made Available to AI Chatbox Programs like ChatGPT
- by Michael Stillman
A large source of additional information for AI (artificial intelligence) chatbox programs, like ChatGPT or Microsoft's Llama, has been opened. Those are the online search programs that answer just about every question you ask in seconds. A type of software known as “Large Language Models” are able to take vast amounts of data, use it to familiarize itself with manners of speech so as to understand this vast database of information, and then pull out what it needs to answer your question. It is utterly amazing what they do, but they can't do it all by themselves. They know nothing but what they are fed, and if they are to respond from the knowledge of vast amounts of information, that information must come from somewhere.
Much of it comes from the internet, which means they must be enough smart to separate the wheat from the chaff, and “chaff” is an overly polite word for a lot of what is out there. In other words, they also need some more reputable sources of information, and books and other publications are an important source for that. However, many (but not all) of the authors and publishers are not pleased with their work being used without payment. Authors, deservedly, get royalties for their work in books, but not for their work when it is copied and used by AI. They have sued to stop this practice and cite copyright law, as these works are copyrighted.
All of this is in the courts and how it is resolved is as yet unknown. However, a new source has emerged lately. That is from books in libraries. Harvard University announced that they are making their vast dataset of books from their library available to AI models at no cost. Most of this was created almost two decades ago as part of the Google Books project, where Google scanned and digitized millions of books at various libraries. Harvard compiled this and more as part of their Institutional Data Initiative at the Harvard Law Library. Harvard has files for 386 million pages from almost one million books. They are now making it available for services like ChatGPT to learn from and find answers to your questions.
This will be helpful, particularly for understanding historic material, but there is one very major drawback. It is safe to use these books without risk of being sued because they are out of copyright. Copyright terms are 95 years. Therefore, none of these books is less than 95 years old. This will not be much good for providing medical advice, even if it sometimes feels like this must be where RFK Jr. gets his medical recommendations. You want the latest opinions for medical diagnoses and the same for other scientific knowledge. Good luck fixing your computer or car with advice that predates 1930, unless you have a Model T. Of course, these programs already have a lot of later information in place (some of which they are being sued to remove). It just means that these 386 million new pages won't add much to answers you seek for these sorts of questions.
It should be noted that some information Harvard is providing is more recent since it is not subject to copyright. One example is legal case law. These court opinions are available to anyone to read – they need to be for legal experts to understand the law. This recent case law is being provided to the AI models that want to add it.
Update: A few days ago, the first court decision came down in a case of authors suing chatbox for copyright violation. The authors lost. Click here for more.
Swann Maps & Atlases, Natural History & Color Plate Books December 9, 2025
Swann, Dec. 9: Lot 156: Cornelis de Jode, Americae pars Borealis, double-page engraved map of North America, Antwerp, 1593.
Swann, Dec. 9: Lot 206: John and Alexander Walker, Map of the United States, London and Liverpool, 1827.
Swann, Dec. 9: Lot 223: Abraham Ortelius, Typus Orbis Terrarum, hand-colored double-page engraved world map, Antwerp, 1575.
Swann Maps & Atlases, Natural History & Color Plate Books December 9, 2025
Swann, Dec. 9: Lot 233: Aaron Arrowsmith, Chart of the World, oversize engraved map on 8 sheets, London, 1790 (circa 1800).
Swann, Dec. 9: Lot 239: Fielding Lucas, A General Atlas, 81 engraved maps and diagrams, Baltimore, 1823.
Swann, Dec. 9: Lot 240: Anthony Finley, A New American Atlas, 15 maps engraved by james hamilton young on 14 double-page sheets, Philadelphia, 1826.
Swann Maps & Atlases, Natural History & Color Plate Books December 9, 2025
Swann, Dec. 9: Lot 263: John Bachmann, Panorama of the Seat of War, portfolio of 4 double-page chromolithographed panoramic maps, New York, 1861.
Swann, Dec. 9: Lot 265: Sebastian Münster, Cosmographei, Basel: Sebastian Henricpetri, 1558.
Swann, Dec. 9: Lot 271: Abraham Ortelius, Epitome Theatri Orteliani, Antwerp: Johann Baptist Vrients, 1601.
Swann Maps & Atlases, Natural History & Color Plate Books December 9, 2025
Swann, Dec. 9: Lot 283: Joris van Spilbergen, Speculum Orientalis Occidentalisque Indiae, Leiden: Nicolaus van Geelkercken for Jodocus Hondius, 1619.
Swann, Dec. 9: Lot 285: Levinus Hulsius, Achtzehender Theil der Newen Welt, 14 engraved folding maps, Frankfurt: Johann Frederick Weiss, 1623.
Swann, Dec. 9: Lot 341: John James Audubon, Carolina Parrot, Plate 26, London, 1827.
SD Scandinavian Art & Rare Book Auctions The Odfjell Collection Polar – History – Ornithology – Colour Plate Books Ending December 4th
Scandinavian Art & Rare Books Auctions, Dec. 4: ROALD AMUNDSEN: «Sydpolen» [ The South Pole] 1912. First edition in jackets and publisher's slip case.
Scandinavian Art & Rare Books Auctions, Dec. 4: AMUNDSEN & NANSEN: «Fram over Polhavet» [Farthest North] 1897. AMUNDSEN's COPY!
Scandinavian Art & Rare Books Auctions, Dec. 4: ERNEST SHACKLETON [ed.]: «Aurora Australis» 1908. First edition. The NORWAY COPY.
Scandinavian Art & Rare Books Auctions, Dec. 4: ERNEST SHACKLETON: «The heart of the Antarctic» + SUPPLEMENT «The Antarctic Book», 1909.
Scandinavian Art & Rare Books Auctions, Dec. 4: SHACKLETON, BERNACCHI, CHERRY-GARRARD [ed.]: «The South Polar Times» I-III, 1902-1911.
SD Scandinavian Art & Rare Book Auctions The Odfjell Collection Polar – History – Ornithology – Colour Plate Books Ending December 4th
Scandinavian Art & Rare Books Auctions, Dec. 4: [WILLEM BARENTSZ & HENRY HUDSON] - SAEGHMAN: «Verhael van de vier eerste schip-vaerden […]», 1663.
Scandinavian Art & Rare Books Auctions, Dec. 4: TERRA NOVA EXPEDITION | LIEUTENANT HENRY ROBERTSON BOWERS: «At the South Pole.», Gelatin Silver Print. [10¾ x 15in. (27.2 x 38.1cm.) ].
Scandinavian Art & Rare Books Auctions, Dec. 4: ELEAZAR ALBIN: «A natural History of Birds.» + «A Supplement», 1738-40. Wonderful coloured plates.
Scandinavian Art & Rare Books Auctions, Dec. 4: PAUL GAIMARD: «Voyage de la Commision scientific du Nord, en Scandinavie, […]», c. 1842-46. ONLY HAND COLOURED COPY KNOWN WITH TWO ORIGINAL PAINTINGS BY BIARD.
Scandinavian Art & Rare Books Auctions, Dec. 4: JAMES JOYCE: «Ulysses», 1922. FIRST EDITION IN ORIGINAL WRAPPERS.
Sotheby’s Book Week December 9-17, 2025
Sotheby’s, Dec. 11: Darwin and Wallace. On the Tendency of Species to form Varieties..., [in:] Journal of the Proceedings of the Linnean Society, Vol. III, No. 9., 1858, Darwin announces the theory of natural selection. £100,000 to £150,000.
Sotheby’s, Dec. 11: J.K. Rowling. Harry Potter and the Philosopher's Stone, 1997, first edition, hardback issue, inscribed by the author pre-publication. £100,000 to £150,000.
Sotheby’s, Dec. 11: Wolfgang Amadeus Mozart. Autograph sketchleaf including a probable draft for the E flat Piano Quartet, K.493, 1786. £150,000 to £200,000.
Sotheby’s, Dec. 12: Hooke, Robert. Micrographia: or some Physiological Descriptions of Minute Bodies made by Magnifying Glasses. London: James Allestry for the Royal Society, 1667. $12,000 to $15,000.
Sotheby’s, Dec. 12: Chappuzeau, Samuel. The history of jewels, first edition in English. London: T.N. for Hobart Kemp, 1671. $12,000 to $18,000.
Sotheby’s, Dec. 12: Sowerby, James. Exotic Mineralogy, containing his most realistic mineral depictions, London: Benjamin Meredith, 1811, Arding and Merrett, 1817. $5,000 to $7,000.