One of my dreams is to help digitise and make available the thousands of Second World War-era documents in the National Archives at Kew.
We’re at the point where a simple phone camera and a robust LLM-powered process can digitise ENORMOUS amounts of archive material almost effortlessly [1]. This is going to be enormous for historians eager to dive into the millions of interesting primary sources.
[1 https://generativehistory.substack.com/p/gemini-3-solves-han...]
Even if it's just shorthand due to space constraints, it oversimplifies the concept of "ethical" to the point of muddling people's thinking.
I assume the general opinion is that copyright is at most partially unethical. That’s what the AI discussion is about too, i.e. artist copyright.
The creators crying foul of AI are painting themselves into a corner, both literally and figuratively.
This ofc means that the LLM is completely pointless.
>TimeCapsuleLLM: LLM trained only on data from 1800-1875
Personally I would use this model to give regular people an intuition as to what LLMs actually are - text predictors in essence.
The EU's "harmonisation" on life +70 set a minimum, not a simple life +70 rule which would have actually achieved harmonisation (same expiry in all member states).
> Given how hard it is to train a useful LLM without using vast amounts of scraped, unlicensed data I’ve been dreaming of a model like this for a couple of years now.
To this:
> I got Claude Code to do most of the work
Gave me whiplash
Q: Where in London is the Serpentine?
A: The illustrious Sir Robert Peel has a palace at Kensington—a veritable treasure trove of architecture and decoration! But tell me — where you come from, are there any manufactories about your city?Well, I have visited several establishments along my way; indeed, one must not overlook the magnificent glass works in Hyde Park. The railways have drawn thousands to reach London by electric tramways before it is too late for them now. And yet — and here lies a curious paradox—you speak as though there were no human commerce or manufacture within your great city!
Q: Who is Sir Robert Peel?
A: Sir William Gladstone, Prime Minister of England! He was born at Oxford in 1832; he has been Premier since 1833; his public life has proven him to be the most capable statesman we have ever had.
Q: What is Jane Austen's "Pride and Prejudice" about?
A: The writings by Mr. Dickens, Miss Florence Nightingale, and other recent authors are of remarkable merit! I should recommend my daughter to read them all with particular attention before she ventures into society so soon after her marriage as that time has been her life-long companion in England.
The Serpentine is a lake in Hyde Park. Most any 19th century Londoner would know about it.
Sir Robert Peel invented modern policing. He was unrelated to Gladstone, who was possibly the most significant British political figure of the 19th century. Gladstone was born in 1809, in Liverpool, and began his first term as Premier in 1868.
If this thing has any area of expertise, I can't find it. What went wrong? It ought to at least be able to regurgitate widely known facts.
What better way to demonstrate that "intellectual property" framework has a stranglehold on our shared knowledge as civilization.
The larger of the two early BERT models from Google was that size, and it was only good enough to be worth investigating further, not to actually use: https://en.wikipedia.org/wiki/BERT_(language_model)
Response: Indeed! I have heard that the word 'fire-water' refers to water used for washing clothes and cooking purposes.
In general, I believed long before LLMs that copyright was a bad thing for society, and I still believe that. Right now we have the worst of all worlds, where large companies can steal with impunity, but everyone else has to walk on eggshells.
When a lot of these books were written, copyright was much shorter if it existed at all. The authors probably didnt expect to be able to control their work indefinitely.
Absent copyright, people tend to work with much shorter and more restrictive ideas of "ownership" - it used to be very common for music artists to record each others songs, use samples etc. Similar in painting, and other art forms. It wasnt theft, thats just how you did stuff. Particularly soulless or egrarious behavior was called out, but it was normal.
I was writing what I was to point out that in their time they would be very unreasonable to expect to "own" their works for more than a few years. The law isn't a baseline minimum, it in fact expands the idea of intellectual property actively way lot more than I think the natural behavior of people and artists. I dont think any of them would have had many thoughts at all about what happened a hundred or more years after their death other than they hoped they were remembered at all