Rare Book Monthly

Articles - February - 2025 Issue

Your Chatbot May Be Using Illegally Pirated Books to Answer Your Questions

A battle is brewing between an ancient source of information, the book and its authors, versus a new invention, the chatbot and its developers. The chatbot is a program that can answer whatever questions you throw at it. The grandaddy (all of three years old) and most famous chatbot is ChatGPT. It uses artificial intelligence (AI) to quickly sort through reams of information to answer your every question. But, where does it get that information? One of the major sources is books, copyrighted books. When the chatbot uses that information to answer your questions, the authors and publishers of those books get nothing. That makes them sad (perhaps a better word is “angry” or “POed”).

 

Some authors are angry enough to go to court. There are various cases floating around out there but a notable one pits comedian and writer Sarah Silverman against Meta, operator of Facebook, headed by Mark Zuckerberg. Meta's chatbot, Llama, is the culprit here.

 

It is alleged that Meta used the LibGen (Library Genesis) dataset to train its Llama chatbot. LibGen is a notorious, shadowy entity, possibly operating out of Russia. It's dataset contains over 196,000 pirated books. LibGen has been in the news before for “lending” its pirated books free of charge without compensating the authors. LibGen infringes on authors' copyrights and operates illegally but it doesn't matter. They can't be shut down or forced to pay because they can't be found. They regularly change their urls to avoid being shut down. LibGen is no small operation, receiving an estimated 9 million visits per month from the U.S. to “borrow” books. It is supported by donations (accepted in untraceable bitcoin only).

 

What Meta has been accused of doing is using this large pirated database of books to supply Llama with much of the information it needs to answer users' questions. The plaintiffs have alleged that approval to do so came from the top, Mr. Zuckerberg himself. This claim has focused on the use of pirated (illegally obtained) books, but that perhaps is not the biggest issue here. What if the books were legally obtained, purchased, borrowed from a physical library, or received as gifts. Would that be any better from a copyright standpoint? Probably not.

 

In Meta's opinion, this use of the authors' work fits under the “Fair Use” exception to copyrights. “Fair Use” is what lets you quote from a book, write a review or book report, use information you found therein to write something of your own, without violating its copyright. Generally speaking, if you change what you read, add your own twist, copy only a small portion, and such, you are not guilty of copyright infringement. What Meta is doing, leaving aside the issue of using LibGen's pirated texts, is both copying the entire book, but then only sharing a small, rewritten portion such as might be expected to pass the Fair Use text.

 

This will have to play out in court but the Judge seems less than impressed with the arguments made by the authors. The reality is that chatbots provide very useful information. You probably use one to answer your questions. It's sort of like speaking to a very learned individual. Practically speaking, paying 196,000 authors some small pittance each would be an absolute nightmare, and they might not agree to such an arrangement anyway. It's not that they don't deserve anything, but it probably isn't a lot, and making such demands might force the shutting down of this very new and useful technology altogether. Progress is hard to stop, even if some people feel hurt by it, and my guess is the courts will not do so here.


Posted On: 2025-02-04 01:22
User Name: keeline

I don't know that an AI LLM (large Language Model) like ChatGPT and others fits the usual definition of a "chatbot." That word is usually used for software that is in a chat session and may represent itself as a human. They have been around for a long time.

There is an allegation that the LLMs have been trained on collections of books, copyright and public domain, along with crawled websites, posts in groups, and more. But a court case will have to determine what was really used. We may only know if it is decided IN COURT. Often, an out-of-court settlement will leave many important details confidential.

Although LLMs create an illusion of intelligence, they are not. The best you can say is that they can write a word salad that is passable. Sometimes even this is after several tries and refinements of the prompt that it responds to.

Attempts to get ChatGPT to write fiction in a certain style usually fails. At the very least there are major plot holes that a human editor/reader would spot. I've seen several examples of this.

LLMs are better with words than they are basic arithmetic. This is part of the reason why the AI-generated art often has logical errors and too many fingers or not enough keys on a typewriter. It's a bit like looking at a toy train where the designer has a vague notion of a train but not an understanding of the real parts of a steam locomotive.

When people ask for book IDs from ChatGPT, I have seen it make up books that do not and never did exist. One example invented a volume in the Judy Bolton series which never existed and was not even a proposed story. This leads people down false trails and spending energy to look for books that do not exist. Then there is the energy required to complete the LLM query in the first place.


Rare Book Monthly

  • Forum Auctions
    Fine Books, Manuscripts and Works on Paper
    29th January 2026
    Forum, Jan. 29: Plato. [Apanta ta tou Platonos. Omnia Platonis opera], 2 parts in 2 vol., editio princeps of Plato's works in the original Greek, Venice, House of Aldus, 1513. £8,000-12,000
    Forum, Jan. 29: Book of Hours, Use of Rome, In Latin, illuminated manuscript on vellum, [Southern Netherlands (probably Bruges), c.1460]. £6,000-8,000
    Forum, Jan. 29: Correspondence and documents by or addressed to the first four Viscounts Molesworth and members of their families, letters and manuscripts, 1690-1783. £10,000-15,000
    Forum Auctions
    Fine Books, Manuscripts and Works on Paper
    29th January 2026
    Forum, Jan. 29: Shakespeare (William). The Dramatic Works, 9 vol., John and Josiah Boydell, 1802. £5,000-7,000
    Forum, Jan. 29: Joyce (James). Ulysses, first edition, one of 750 copies on handmade paper, Paris, Shakespeare and Company, 1922 £8,000-12,000
    Forum, Jan. 29: Powell (Anthony). [A Dance to the Music of Time], 12 vol., first editions, each with a signed presentation inscription from the author to Osbert Lancaster, 1951-75. £6,000-8,000
    Forum Auctions
    Fine Books, Manuscripts and Works on Paper
    29th January 2026
    Forum, Jan. 29: Chaucer (Geoffrey). Troilus and Criseyde, one of 225 copies on handmade paper, wood-engravings by Eric Gill, Waltham St.Lawrence, 1927. £3,000-4,000
    Forum, Jan. 29: Borges (Jorge Luis). Luna de Enfrente, first edition, one of 300 copies, presentation copy signed by the author to Leopoldo Marechal, Buenos Aires, Editorial Proa, 1925. £3,000-4,000
    Forum, Jan. 29: Nolli (Giovanni Battista). Nuova Pianta di Roma, Rome, 1748. £6,000-8,000
    Forum Auctions
    Fine Books, Manuscripts and Works on Paper
    29th January 2026
    Forum, Jan. 29: Roberts (David). The Holy Land, Syria, Idumea, Arabia, Egypt, & Nubia, 3 vol., first edition, 1842-49. £15,000-20,000
    Forum, Jan. 29: Blacker (William). Catechism of Fly Making, Angling and Dyeing, Published by the author, 1843. £3,000-4,000
    Forum, Jan. 29: Herschel (Sir John F. W.) Collection of 69 offprints, extracts and separate publications by Herschel, bound for his son, William James Herschel, 3 vol., [1813-50]. £15,000-20,000
  • Dominic Winter
    Books, Maps, Documents & Autographs
    Ornithology, Music, Bookplates
    28th January 2026
    Dominic Winter, Jan. 28: Lot 26. Company School. An album of 85 Indian mica paintings, Madras, c. 1852. £700-1,000
    Dominic Winter, Jan. 28: Lot 28. Ross & Hooker. Notes on the Botany of the Antarctic Voyage, 1st edition, 1843. £4,000-6,000
    Dominic Winter, Jan. 28: Lot 44. Gould (John). The Birds of Great Britain, 5 volumes, 1st edition, 1862-73. £30,000-40,000
    Dominic Winter
    Books, Maps, Documents & Autographs
    Ornithology, Music, Bookplates
    28th January 2026
    Dominic Winter, Jan. 28: Lot 72. Edwards (George). A Natural History of Uncommon Birds… [and] Gleanings of Natural History, 7 volumes, 1st edition, 1743-64. £7,000-10,000
    Dominic Winter, Jan. 28: Lot 87. Walcott (Charles D. et al.). Geologic Atlas of the United States, 227-volume set, U.S. Geological Survey, 1894-1945. £500-800
    Dominic Winter, Jan. 28: Lot 236. A New Dictionary of the Terms Ancient and Modern of the Canting Crew…, By B. E. Gent., 1st edition, [1699]. £3,000-4,000
    Dominic Winter
    Books, Maps, Documents & Autographs
    Ornithology, Music, Bookplates
    28th January 2026
    Dominic Winter, Jan. 28: Lot 245. Frost Fair Broadside. Upon the Frost in the Year 1739-40, Printed on the Ice upon the Thames at Queen-Hithe, 1739/40. £1,500-2,000
    Dominic Winter, Jan. 28: Lot 270. Micheli (Antonino di). La Nuova Chitarra di Regole…, 1st edition, Palermo, 1680. £10,000-15,000
    Dominic Winter, Jan. 28: Lot 280. Elgar (Edward). Concerto for Violin and Orchestra, [1910], signed presentation copy. £500-800
    Dominic Winter
    Books, Maps, Documents & Autographs
    Ornithology, Music, Bookplates
    28th January 2026
    Dominic Winter, Jan. 28: Lot 286 - Walton (William, 1902-1983). Autograph manuscript full score for Belshazzar’s Feast, [1930-31]. £20,000-30,000
    Dominic Winter, Jan. 28: Lot 304. Churchill (Winston). A terracotta maquette of Churchill by Oscar Nemon, c. 1955. £1,500-2,000
    Dominic Winter, Jan. 28: Lot 364 - Russian Imperial Archaeological Commission. Mecheti Samarkanda..., Fascicule I Gour-Emir, St. Petersburg, 1905. £2,000-3,000
  • Sotheby’s
    Fine Manuscript and Printed Americana
    27 January 2026
    Sotheby’s, Jan. 27: An extraordinary pair of books from George Washington’s field library, marking the conjunction of Robert Rogers, George Washington, and Henry Knox. $1,200,000 to $1,800,000.
    Sotheby’s, Jan. 27: An extraordinary letter marking the conjunction of George Washington, the Marquis de Lafayette, and Benjamin Franklin. $1,000,000 to $1,500,000.
    Sotheby’s, Jan. 27: Virginia House of Delegates. The genesis of the Declaration of Independence and the Bill of Rights. $350,000 to $500,000.
    Sotheby’s
    Fine Manuscript and Printed Americana
    27 January 2026
    Sotheby’s, Jan. 27: (Gettysburg). “Genl. Doubleday has taken charge of the battle”: Autograph witness to the first day of the Battle of Gettysburg, illustrated by fourteen maps and plans. $200,000 to $300,000.
    Sotheby’s, Jan. 27: President Lincoln thanks a schoolboy on behalf of "all the children of the nation for his efforts to ensure "that this war shall be successful, and the Union be maintained and perpetuated." $200,000 to $300,000.
    Sotheby’s, Jan. 27: [World War II]. An archive of maps and files documenting the allied campaign in Europe, from the early stages of planning for D-Day and Operation Overlord, to Germany’s surrender. $200,000 to $300,000.

Article Search

Archived Articles