On another note, and perhaps others are feeling similarly, but I am finding myself surprised at how little use I have for this stuff, LLMs included. If, ten years ago, you told me I would have access to tools like this, I'm sure I would have responded with a never ending stream of ideas and excitement. But now that they're here, I just sort of poke at it for a minute and carry on with my day.
Maybe it's the unreliability on all fronts, I don't know. I ask a lot of programming questions and appreciate some of the autocomplete in vscode, but I know I'm not anywhere close to taking full advantage of what these systems can do.
It was largely a solved problem though. Companies did not seem to have an issue with using stock photos. My current company's website is full of them.
For business use cases, those galleries were already so extensive before AI image generation, that what you wanted was almost always there. They seemingly looked at people's search queries, and added images to match previously failed queries. Even things you wouldn't think would have a photo like "man in business suit jump kicking a guy while screaming", have plenty of results.
To think any/all combined stock services would be the end all is just unrealistic. Sure, someone one might have settled on something just because they got tired of scrolling (much like streaming video services), that does not mean they are happy with their selection. Just happy to be done.
Now, with generativeAI, they can have squirrels doing anything in any setting they can describe. If they don't like it, they can just tweak the description until they are happy. It's an obvious plus for them.
I never drank the kool-aid to be all gung-ho on this boom/fad, but I'm not going to be so obstinate that I refuse to accept some people find it quite useful. May someone make all the squirrel attending highschool generative art they want, but you can't tell me some stock place is good 'nuff for everything.
Yes, it's obvious that if your use case is obscure enough, or you need a ton of unique images, they won't work, which is why I said "largely a solved problem".
Also, you're implying that a generative system is so fast that it could create so many variations of your prompt to fill in the search results page in an acceptable time. That's a joke
It’s the same with code. I don’t think software engineers will really be replaced, but small web dev agencies have a good reason to be nervous. Why would you pay someone to make a website for your restaurant when 3-5 prompts will get you there?
The key word is professional. A good restaurant website begins with taking good photos of the premises and the food. AI won't come around to your business and take professional photos.
There's a lot of bits and pieces to a website for bookings, content management, menu updates, etc.
HTML templates and themes have been around for a long time. AI can basically spit out those templates and themes, which is great. But there's still a lot to do before you get to www.fancy-dining.com.
I do this a lot, far more than I actually go to restaurants, because I like adding small business details to OSM. There are a few that have their shit together but the overwhelming majority do not.
"Most are generic crap" doesn't mean restaurants aim for that benchmark when they decide to get a website.
I'm not sure if you're refuting the point I was making, which I'll clarify. "Restaurant website" could be a stand-in for any basic small business website. The claim was that AI threatens small web dev agencies who make small business websites. I don't think it will, as millions of small businesses want something better than "generic crap" or cookie-cut AI copy paste; AND we've had site-building services, social media pages, and template-driven approaches for a long time.
Neither will a web developer?
> There's a lot of bits and pieces to a website for bookings, content management, menu updates, etc.
Bolt.new can handle all these quite easily. Although I know several restaurants with very simple websites that have a few pics, a menu and their hours.
And now these image-generating models are giving us the equivalent of stock photos without the pesky issue of attribution or royalties. What a wonderful time to be alive.
I've produced my own music recordings in the past and I've hired musicians to play the instruments that I cannot. Having exasperated recording engineers watch my 5,000th take on a drum fill that I absolutely cannot play is not the fun part. Sitting behind the glass and watching my vision come to life from a really good drummer is absolutely the fun part.
Is having the ai spit out idea after idea fun in the same way for you?
May I ask what you use? I'm not yet even a paid subscriber to any of the models, because my company offer a corporate internal subscription chatbot and code integration that works well enough for what I've been doing so far but has no image generation.
I have tried image generation on the free tier but run out of free use before I get anyway pleasing.
What do you pay for?
I made a logo for an internal product that wouldn’t have had a logo otherwise at our company. I also make a lot of shitpost memes to my friends to trash talk in the long running turn based war game we’ve all been playing, like “make a cartoony image of a dog man and a Greek giant beating up a devil” and the picture it gave was just hilarious and perfect, like an old timey Popeye cartoon.
Two years ago I was spending three hours using local models like Stable Diffusion to get exactly what I wanted. I had to inpaint and generate 100 variations which would have been insanely expensive if I wasn’t powering it with my own hardware.
Now I get something good in minutes, it’s crazy really.
They're learning to expect to skip the most important part of creating something.
> they were laughing and we had a ton of fun
I'm a parent so I think I get the appeal, but this to be is like saying "they were laughing and having fun while reading and composing legal briefs." I don't see the advantage, and any momentary benefit comes at the cost of a longer-term loss.
ChatGPT is far, far superior (especially now) when you want something more specific that you've already imagined. But it's slower, and unlike Midjourney you don't get four versions to choose to build and iterate on, you get a single image that takes longer to load.
> four versions to choose to build and iterate on
How does this work? How do you ask a model to produce four different variations? Or do they have four different models run the same inference?If you're still not sure let me know and I'll show a screenshot.
The market is saturated and the way it works means ten get rich for every million artists. I feel as though this has been pretty constant throughout history.
Of course there's a lot of talent out there, "wasted", but I think that's always been the case. How many William Shakesmans did we lose with all the war, famine, disease?
I actually decided I'd probably never write music again after 1-shot making a song about the south Korea coup attempt several months ago. I had the song done before the news really even hit the US. Why would I destroy my own hearing writing music anymore when I can prompt an AI to do it for me, with the same net result - no one cares.
here's the 3-shot remix, the triangle cracks me up so much that i had to upload it https://soundcloud.com/djoutcold/coup-detat-symphony-remix
the "original" "1-shot" is on my soundcloud page as well. https://soundcloud.com/djoutcold/i-aint-even-writing-music-a...
it's in lojban. That's why you can't understand it. Yes. Lojban. Brings a tear to my eye every time i hear it. fkin AI
[0] more my style - hold music for our PBX https://soundcloud.com/djoutcold/bew-hold-music also all my stuff is CC licensed, mostly CC0 at this point.
(Just a small comment out of context of the remaining discussion:)
Maybe not many? It could be that "cultural attention" is limited and there's not much space at the top anyways. In other words: It might be that there's always a few famous artists that get remembered and the rest is forgotten. Same as winning the world cup: There's always a team that wins and it says nothing about the quality in a universal way. At best it says something about quality relative to the competition.
(Not sure I'd fully get behind the argument i composed here. But I found it interesting.)
Or maybe they just really were that good.
It hasn't. Look up the collapse of the viability of music as a career. Jaron Lanier has written on this.
If a classroom of 14 year olds are making a game in their computer science class, and they use AI to make placeholder images... Was a real artist harmed?
The teacher certainly cant afford to pay artists to provide content for all the students games, and most students can't afford to hire an artist either.. they perhaps can't even legally do it, if the artist requires a contract... they are underage in most countries to sign a contract.
This technology gives the kids a lot more freedom than a pre-packaged asset library, and can encourage more engagement with the course content, leading to more people interested in creative-employing pursuits.
So, I think this technology can create a new generation of creative individuals, and statements about the blanket harm need to be qualified.
This is your opinion. I don't see how these statements connect to each other.
You might have heard this: it's helpful to strive to be someone only a few years ahead of you. Similar to this, we give calculators to high-schoolers and not 3rd graders. Wolfram-Alpha is similarly at too high a level for most undergraduate students.
Following this, giving an image generator to kids will kill their creativity in the same way that a tall tree blocks upcoming sprouts from the sun. It will lead to less engagement, to dependence, to consumerism.
Scams beget more scams
The appropriation argument is somewhat unsound. Creative endeavors, by definition, build on what's come before. This isn't any different between code, creative writing, drawing, painting, photography, fashion design, music, or anything else creative. Creation builds on what came before, that's how it works. No one accuses playwrights of appropriating Shakespeare just because they write a tragic romance set in Europe.
The hyperbolic way you've made whatever arguments you had, though, is actively working against you.
It remains unclear if they needed permission in the first place. Aside from Meta's stunt with torrents I'm not aware of any legal precedent forbidding me to (internally) do as I please with public content that I scrape.
> They regularly speak explicitly about all the jobs they plan to destroy.
A fully legal endeavor that is very strongly rewarded by the market.
Most of the larger commercial entities seem to be doing the work themselves and being quite upfront about the entire thing.
Because all the litigation is currently ongoing.
> A fully legal endeavor that is very strongly rewarded by the market.
Yes let's sacrifice all production of cultural artifacts for the market. This is honestly another thing that's being litigated. So far these companies have lost a lot of money on making a product that most consumers seem to actively hate.
Who said anything about sacrificing production? The entire point of the tooling is to reduce the production cost to as near zero as possible. If you didn't expect it to work then I doubt you would be so bent out of shape over it.
I find your stance quite perplexing. The tech can't be un-invented. It's very much Pandora's box. Whatever consequences that has for the market, all we can do is wait and see.
Worst case scenario (for the AI purveyors) is a clear legal determination that the current training data situation isn't legal. I seriously doubt that would set them back by more than a couple of years.
I'd like to suggest that you might be better received on HN if you were a bit more direct about making an argument of substance regarding the ethics.
I disagree. When you publish your work, I can't copy it, but I can do nearly anything else I want to with it. I don't need your consent to learn from your work. I can study hundreds of paintings, learn from them, teaching myself to paint in a similar style. Copyright law allows me to do this.
I don't think an AI, which can do it better and faster, changes the law.
If I wrote a program that chose an image at random from 1000 base images, you’d agree that the program doesn’t create anything new. If I added some random color changes, it would still be derivative. Every incremental change I make to the program to make it more sophisticated leaves its outputs just as derivative as before the change.
Just a quick example, what's my proper compensation for this specific post? Can I set a FIVE CENTS price for every AI that learned from my post? How can I OPT-IN today?
I'm coming from the position that current law doesn't require compensation, nor opt-in. I'm not happy with it, but I dont see any easy alternative
You've yet to present a convincing argument regarding the ethics. (I do believe that such arguments exist; I just don't think you've made any of them.)
If you really can't think of a reason, I don't think anybody here is going to be able to offer you one you are willing to accept. This isn't a difficult or complex idea, so if you don't see it, why would anybody bother trying to convince you?
> (I do believe that such arguments exist; I just don't think you've made any of them.)
This is lazy and obnoxious.
The idea I expressed is also quite straightforward. That the act of copying something around in RAM is a basic component of using a computer to do pretty much anything and thus cannot possibly be a legitimate argument against something in and of itself.
The audience on HN generally leans quite heavily into reasoned debate as opposed to emotionally charged ideological signalling. That is presumably sufficient reason for someone to try to convince me, at least if anyone truly believes that there's a sound argument to be made here.
> This is lazy and obnoxious.
How is a clarification that I'm not blind to the existence of arguments regarding ethical issues lazy? Objecting to a lazy and baseless claim does not obligate me to spend the time to articulate a substantial one on the other party's behalf.
That said, the only ethical arguments that immediately come to mind pertain to collective benefit similar to those made to justify the existence of IP law. I think there's a reasonable case to be made to levy fractional royalties against the paid usage of ML models on the basis that their existence upends the market. It's obviously protectionist in nature but that doesn't inherently invalidate it. IP law itself is justified on the basis that it incentivizes innovation; this isn't much different.
Maybe because AI is ultimately nothing but a complicated compression algorithm, and people should really, really stop anthropomorphizing it.
You've presented all sorts of wild assumptions and generalizations about the people who don't share your vehement opposition to the use of this technology. I don't think it's the person you're responding to with the implicit bias.
You've conflated theft with piracy (all too common) and assumed a priori that training a model on publicly available data constitutes such. Do you really expect people to blindly adopt your ideological views if you just state them forcefully enough?
> If using AI is okay for the creative labor, why shouldn't the students also use it for the programming too?
They absolutely should! At least provided it does the job well enough.
Unless they are taking a class whose point is to learn to program yourself (ie the game is just a means to an end). Similar to how you might be forbidden to use certain advanced calculator features in a math class. If you enroll in an art class and then just prompt GPT that likely defeats the purpose.
This is the view of most people outside the industry.
The ones that pay attention to the markets appear to believe some very questionable things and are primarily concerned with if they can figure out how to get rich off of the associated tech stocks.
Practically speaking, the work described would most likely never have been done, rather than been done by an artist if that were the only option - it’s uncommon to employ artists to help with incidental tasks relative to side projects, etc.
So stating that people shouldn't need to worry about starving (metaphorically or otherwise) would be roughly equivalent.
And many fold more than that are forced to drop out to "get a real job".
Of course all of the above is a good thing from the perspective of maximizing the quality of life across society as a whole. But wouldn't it be nicer if we didn't have to do (as much of) that?
The reason I don't use AI is because it gives me far less reliable and impossible to specify results than just searching through the limited lists of human made art.
Today, for undisclosed reasons, I needed vector art of peanuts. I found imperfect but usable human made art within seconds from a search engine. I then spent around 15 - 25 minutes trying to get something closer to my vision using ChatGPT, and using the imperfect art I'd found as a style guide. I got lots of "huh that's cool what AI can do" but nothing useful. Nothing closer to my vision than what I started with.
By coincidence it's the first time I'vr tried making art with AI in about a year, but back then I bought a Midjourney account and spent a month making loads of art, then installed SD on my laptop and spent another couple of weeks playing around with that. So it's not like I'm lacking experience. What I've found so far is that AI art generators are great for generating articles like this one. And they do make some genuinely cool pictures, it blows my mind that computers can do this now.
It's just when I sit down with a real world task that has specific, concrete requirements... I find them useless.
Other than stock photos, porn is the killer app for that, but most of the AI companies don't want to allow that.
I can never think of anything to talk to an AI about. I run LM local, as well
Ask it to teach you a language.
DnD works really well (the LLM being the game-master).
Even if technological progress on AI were to stop today, and the best models that exist in 2030 are the same models we have now, there would still be years of social and economic change as people and companies figure out how to make use of novel technology.
Noticed tht.
Maybe it's my algorithm but YouTube is seemingly filled with these videos now.
True, I do enjoy watching the LawTubers and sometimes they talk about HOAs but that is a far stretch from someone taking a reddit post and laundering it through the robots.
What's frustrating me is if I tell the Youtube algo 'don't recommend' to AI music video channels it stops giving me any music video channels. That's not what I want, I just don't want the AI. They need to seperate the two. But of course they need to not do that with AI cover images because otherwise it would harm me. :)
https://yosefk.com/blog/the-state-of-ai-for-hand-drawn-anima...
Maybe this multimodal thing can fix that?
There has been a lot of progress since then: https://doubiiu.github.io/projects/ToonCrafter/
This is arguably a good thing because if production cost drops it should mean either higher quality or more content.
If a local restaurant is using this stuff we're near an inflection point of adoption.
I've built a simracing tool that's about 50% ai code now, and ai is mostly the boilerplate, accelerating prototyping and most of the data structure packing unpacking needed
It never managed to did a pit stop window prediction on its own, but could create a reasonable class to handle tire overheating messages
All in all what I can say from this experiment is that it enabled me to get started as I'm unfamiliar with pygame and the ux is entirely maintained by Ai.
Working on classes togheter sucks as the ai puts too many null checks and try catches making the code unreadable by humans, I pretty much prefer to make sure data is correctly initialized and updated than the huge nest of condition llm produce, so I ended up with clearly defined ai and human components.
It's not prefect yet but I can focus on more valuable thing. And it's a good step up from last year where I just used it to second check and enrich my technical writing and coverting notes into emails.
With vision and image generation I think we're closer to create a feedback loop where the Ai can rapidly self correct it's productions, but the ceiling remains to be seen to understand how far this will go.
I think this will change as more practical use cases begin to emerge as this is all brand new. For example, the photos you take with your smartphone can tell a story or be annotated so you can see things in the photos you didn't think about but your profile thinks you might. Things will get more sophisticated soon.
I have had use for LLMs and previous era image gens. I haven't got around to trying the last iterations that this article is about yet.
That use I have had of it is very esoteric, an art mostly forgotten in the digital modernity, it's called "HAVING FUN", by myself, for curiosities, for sharing with friends.
That is by far the greatest usage area and severely underrated. AI for having fun, enjoyment that feels meaningful.
If you're a spam-producer or scam artist, or industrial digi-slop manufacturer or merchant of hype, or some other flavor of paid liar(journalist, influencer, spokesperson, diplomat or politician) then sure, AI will also earn you money. And the facade for this money making enterprises will look shinier for every year that passes but it will be all rotting organics and slops behind that veneer, as it have since many years before my birth.
I'm in the game for the fun part, and that part is noticably improving.
Well, that's because they suck, despite all the hype.
They have a use in a professional context, i.e., as replacement for older models and algorithms like BERT or TF/IDF.
But as assistants they're only good as a novelty gag.
Using the prompt to detect and choose the most appropriate model checkpoint and LoRa(s) along with rewriting a prompt to most appropriately suit the chosen model has been pretty bog standard for a long time now.
Which players are doing this? I haven't heard of this approach at all.
Most artistic interfaces want you to visually select a style (LoRA, Midjourney sref, etc.) and will load these under the hood. But it's explicit behavior controlled by the user.
The only thing we currently have to go off of is OpenAI's own words, which claims the images are generated by a single multimodal model autoregressively, and I don't think they are lying.
I don't really see that with chatgpt, what I do see is that it's presumably running the same basic query with just whatever you said different each time instead of modifying the existing image. Like if you say "generate a photo of a woman", and get a pic and then say "make her hair blonde", the new image is likely to also have different facial features.
I doubt any of these companies have rolled their own interface to stable diffusion / transformers. It's copy and paste from huggingface all the way down.
I'm still waiting for a confirmed Diffusion Language Model to be released as gguf that works with llama.cpp
If you think that companies like OpenAI (for all the criticisms they deserve) don't use their own inference harness and image models I have a bridge to sell to you.
I guess the "convenience" just happened to get ported over from "Auto1111", or it's a coincidence, or
I thought this was obvious? At least from the first time (and only time) I used it, you can clearly see that it's not just creating one image based on the prompt, but instead it first creates a canvas for everything to fit into, then it generates piece by piece, with some coordinator deciding the workflow.
Don't think we need evidence either way when it's so obvious from using it and what you can see while it generates the "collage" of images.
We can't really be sure until OpenAI tells us.
Unconvinced by that tbh. This could simply be a bias with the encoder/decoder or the model itself, many image generation models showed behaviour like this. Also unsure why a sepia filter would always be applied if it was a workflow, what's the point of this?
Personally, I don't believe this is just an agentic workflow. Agentic workflows can't really do anything a human couln't do manually, they just make the process much faster. I spent 2 years working with image models, specifically around controllability of the output, and there is just no way of getting this kind of edits with a regular diffusion model just through smarter prompting or other tricks. So I don't see how an agentic workflow would help.
I think you can only get there via a true multimodal model.
* The weird-ass basket decoration on the table originally has some big chain links (maybe anchor chain, to keep the theme with the beach painting). By the third version, they're leathery and are merging with the basket.
* The candelabra light on the wall, with branch decorations, turns into a sort of skinny minimalist gold stag head, and then just a branch.
* The small table in the background gradually loses one of its three legs, and ends up defying gravity.
* The freaky green lamps in the window become at first more regular, then turn into topiary.
* Making the carpet less faded turns up the saturation on everything else, too, including the wood the table is made from.
So rather than predicting each patch at the target resolution right away, it starts with the image (as patches) at a very small resolution and increasingly scales up. I guess that could make it hard for the model to learn to just copy and paste image tokens for editing like it might for text.
Also, sweet jesus, after more than a year of hilarious frustration, it now knows that a flying squirrel is a real animal and not just a tree squirrel with butterfly wings.
As an example the tape spindles, among other changes, are different: https://chatgpt.com/share/67f53965-9480-800a-a166-a6c1faa87c...
https://help.openai.com/en/articles/9055440-editing-your-ima...
I'm hoping we see an open weights or open source model with these capabilities soon, because good tools need open models.
As has happened in the past, once an open implementation of DallE or whatever comes out, the open source community pushes the capabilities much further by writing lots of training, extensions, and pipelines. The results look significantly better than closed SaaS models.
I think the fireplace might be turning into some tiny stairs leading down. :)
I have to disagree with the conclusion. This was an important discussion to have two to three years ago, then we had it online, and then we more or less agreed that it's unfair for artists to have their works sucked up with no recourse.
What the post should say is "we know that this is unfair to artists, but the tech companies are making too much money from them and we have no way to force them to change".
Its also not clear for example that Studio Ghibli lost by having their art style plastered all over the internet. I went home and watched a Ghibli film that week, as I'm sure many others did as well. Their revenue is probably up quite a bit right now?
"How can we monetize art" remains an open question for society, but I certainly don't think that AI without restrictions is going to lead to fewer people with art jobs.
Small artists get paid to create the art; corporations benefit from exclusivity.
This sounds like a rewording of "You won't get paid, but this is a great opportunity for you because you'll get exposure".
Studio Ghibli on the other hand had exposure to millions of people (maybe hundreds of millions), and probably >5% of those were potential customers.
So yes, being paid in exposure makes sense, if the exposure is actually worth what the art is worth. But most people offering to pay in exposure are overvaluing their exposure by 100x or more.
There's a lot of ifs in here. The number of people exposed to has an estimate that covers two orders of magnitude, "maybe". "probably". "greater than". "potential".
In order for this exposure to have more value than the ownership of the original, all of those things need to fall into place. And no one can offer meaningful exposure based on the off-chance that a meme goes viral. All the risk is on the creator, they lose control of their asset and receive a lottery ticket in return.
> So yes, being paid in exposure makes sense, if the exposure is actually worth what the art is worth. But most people offering to pay in exposure are overvaluing their exposure by 100x or more.
Yes, but that's a big "but"; it's difficult to know the value of the "exposure" that is being offered, not to mention if the entity offering it is legit or if it's just a scam because they don't want to pay.
Additionally, the AI companies who are slurping up copyrighted works to train their models are not offering exposure. And the mememaker who happens go viral can't offer it either.
Maybe Studio Ghibli is much more than merely a style. Maybe people aren't looking at their production just for the style.
Most people dislike wearing fake clothes and the dislike wearing fake watches or fake jewelry. Because it isn't just about the style.
I'd disagree. Most people don't like buying something 'real' then finding out it's fake. Far more people don't mind an actual fake if it's either high quality or is very low priced.
> but I certainly don't think that AI without restrictions is going to lead to fewer people with art jobs.
It's great that you think that but in reality a lot of artists are saying they're getting less work these days. Maybe that's the result of a shitty economy but I find it very difficult to believe this technology isn't actively stealing work from people.
Good. That means we as a society get more art cheaper. I've long since grown tired of sponsoring greed of artists.
The threat of AI produced art will forever trivialise human artistic capabilities. The reality is: why bother when it can be done faster and cheaper? The next generation will leverage it, and those skills will be very rare. It is the nature of technology to do this.
If the effort required to create that can just be ingested by a machine and replicated without consequence, how would it be viable for someone to justify that kind of investment? Where would the next evolution of the art form come from? Even if some company put in the time to create something amazing using AI that does require an investment, the precedent is that it can just be ingested and copied without consequence.
I think aside from what is legal, we need to think about what kind of world we want to live in. We can already plainly see what social media has done to the world. What do you honestly think the world will look like once this plays out?
Nothing? Just like how if some studio today invests millions of man-hours and does a competing movie in Studio Ghibli's aesthetic (but not including any Studio Ghibli's characters, branding, etc. - basically, not the copyrightable or trademarkable stuff) nothing out of ordinary is going to happen.
I mean, artistic style is not copyrightable, right?
It means art can get more ambitious. Ghibli made their mark, and made their money. Now it's time for the next generation to have a turn.
Also, I don't get this weird sense of entitlement people have over someone else's work. Just because it can be copied means it should belong to everyone?
It's bad because you will never get an original visual style from now on. Everything will be copy-paste of existing styles, forever.
I fail to see how artistic expression would cease to be a thing and how people will stop liking novelty. And as long as those are a thing, original styles will also be a thing.
If anything, making the entry barriers lower would result in more original styles, as art is [at least] frequently an evolutionary process, where existing ideas meet novel ones and mix in interesting ways. And even for the entirely novel (from-scratch, if that's a thing) ideas will still keep appearing - if someone thinks of something, they're still free to express themselves, as it was always the case. I cannot think of why people would stop painting with brushes, fingers or anything else.
Art exists because of human nature. Nothing changes in this regard.
As I've said, art styles are not considered copyrightable. You say I'm missing the point but I fail to see why. I've used lack of copyright protection as a reality check, a verifiable fact that can be used to determine the current consensus on the matter. Based on this lack of legal protection, I'm concluding that the societies have considered it's not something that needs to be protected, and thus that there is no "ripping off" in replicating a successful style. I have no doubts there are plenty of people who would think otherwise (and e.g. say that current state of copyright is not optimal - which can be very true), but they need to argue about copyright protections not technological accessibility. The latter merely exposes the former (by drastically lowering the cost barriers), but is not the root issue.
I also have doubts about your prediction of stagnation, particularly because you seem to ignore the demand side. People want novelty and originality, it was always the case and always will be (or at least for as long as human nature doesn't change). Things will change for sure (they always do), but I don't think a stagnation is a realistic scenario.
(After all, it's yet another ephemeral image in "that AI style", with no apparent thought having gone into it, just some name dropping, at best. Or some generated, senseless story, you would be glad, the algorithm hadn't pointed your kids at. Why should you?)
Yet much of the best art imho is in the wild to the element while being at home at some random place. Or perhaps in someone's collection forgot and displaced. Art's worth will always be an open question.
> He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me.
The term "intellectual property" is an attempt to conflate these things, to justify net-destructive money grabs like retroactive copyright term extensions, because traditional property rights don't expire but copyrights explicitly and intentionally do.
Allowing people to own physical items as business inventory or production equipment and compete with each other for the customer's dollar is entirely possible without the existence of copyright or patents. You would then be relying on some combination of open source, charitable contributions and patronage, industry joint ventures, personal itch scratching, etc. to create writings and inventions, but books and the wheel were created before patents and copyrights were.
More likely trade secrets, NDAs, non-competes, and increasingly invasive DRM. In addition to the direct financial incentive, part of the logic behind IP law is to foster a more open market because that should be to the benefit of society at large in multiple ways.
Patents, for example, ensure that at least some minimal description of the process gets published for others to take inspiration from.
These are all also creatures of the law. If there was no copyright there would be no Digital Millennium Copyright Act. In many cases they can't work, e.g. because of the analog hole or because the mechanism of operation is observable to anyone who buys the product.
The incentives to uncover those things are also much stronger in modern day because of the connectedness of the world. If there were two wheelwrights in your town and one of them had a secret process, no one but the other would have any use for it and if they found out they wouldn't even have any else to tell it to.
If someone had a secret video encoding strategy today, some hobbyists would reverse engineer it and post it on the internet.
> Patents, for example, ensure that at least some minimal description of the process gets published for others to take inspiration from.
Have you read a modern patent? They're inscrutable, and to the fullest extent allowable attempt to claim the overall concept of doing something rather than describing a specific implementation.
Careful not to confuse illegal with unable.
> [can't work because] the mechanism of operation is observable to anyone who buys the product.
That was my point about increasingly invasive DRM. Without IP law, the only way for large swaths of industry to sustain themselves would be to deal exclusively on extensively secured platforms. Imagine a scenario where all paid services (software, streaming, and quite literally everything else) was only available on hardware attested devices rooted with only one or a few players.
In the hypothetical scenario where it is explicitly legal to copy any binary that you gain possession of (ie copyright doesn't exist) I think that's what we would see.
> If someone had a secret video encoding strategy today, some hobbyists would reverse engineer it and post it on the internet.
Which is why patents exist. When companies decide how much to invest in what this is taken into account.
Notably due to the lack of popularity (and thus lack of adoption) of patent encumbered video and audio standards, anyone trying to make a direct profit effectively dropped out years ago. At this point it's driven by behemoths that realize significant downstream cost savings.
> Have you read a modern patent? They're inscrutable
Yes, I'm aware. Consider how much worse things could be though. No hint, every employee who worked on it under both NDA and non-compete. Imagine how much more difficult the labor market would be to navigate if the government didn't intervene to prevent overbearing terms in such a scenario. Consider what all of this would do to market efficiency.
My point was never to disagree with your broad strokes (that the free market is perfectly capable of functioning in the absence of IP law). Rather it was to point out that despite all the downsides, IP law does clearly offer some collective benefits by significantly reducing incentives that would otherwise drive greedy individuals to act against common interests.
We already have some content where this has been attempted. That content is on the piracy sites. And that's when breaking the DRM and piracy sites are both illegal.
They simply wouldn't use a business model where they first make something and then try to charge people for it after. Instead you might have a subscription service, but the subscription is patronage, i.e. you want them to keep producing content and if enough people feel the same way, they make enough to keep doing it. But the content they release is available to everyone.
> every employee who worked on it under both NDA and non-compete. Imagine how much more difficult the labor market would be to navigate if the government didn't intervene to prevent overbearing terms in such a scenario. Consider what all of this would do to market efficiency.
The assumption is that such NDAs would be enforceable. What if they're not?
> Rather it was to point out that despite all the downsides, IP law does clearly offer some collective benefits by significantly reducing incentives that would otherwise drive greedy individuals to act against common interests.
The greedy individuals could be addressed by banning their attempts to reconstitute copyright through thug behavior. The real question is, would we be better off without it, if some things wouldn't be created?
Likely the optimal balance is close to the original copyright terms, i.e. you get 14 years and there is none of this anti-circumvention nonsense which in practice is ineffective at its ostensible purpose and is only used to monopolize consumer devices to try to exclude works that compete with the major incumbents. But the existing system is so far out of whack that it's not clear it's even better than nothing.
I think you could reasonably expect the iOS model to become the only way to purchase paid software as well as any number of other things where IP is a concern. You would have hardware backed attestation of an entirely opaque device.
> Likely the optimal balance is close to the original copyright terms
I'm inclined to agree.
> in practice is ineffective at its ostensible purpose and is only used to monopolize consumer devices to try to exclude works that compete with the major incumbents.
I'd argue that was the actual purpose to begin with. Piracy being illegal means that operating at scale and collecting payments becomes just about impossible. DRM on sanctioned platforms means the end user can't trivially shift content between different zones. The cartels are able to maintain market segmentation to maximize licensing revenue. Only those they bless are permitted entry to compete.
> the existing system is so far out of whack that it's not clear it's even better than nothing.
I agree. I think the current system is causing substantial harm for minimal to no benefit relative to the much more limited original copyright terms.
> The assumption is that such NDAs would be enforceable. What if they're not?
So in addition to removing IP legislation this is now a hypothetical scenario were additional regulation barring the sorts of contracts that could potentially fill that void is also introduced?
> The greedy individuals could be addressed by banning their attempts to reconstitute copyright through thug behavior.
You're too focused on copyright. The behavior is simple defense of investment. The players are simply maximizing profit while minimizing risk.
Keep in mind we're not just talking about media here. This applies to all industrial R&D. You're describing removing the legal protections against cloning from the entire economy.
If you systematically strip away all the legal defense strategies then presumably one of two things happens. Either the investment doesn't happen in the first place (on average, which is to say innovation is severely chilled market wide). Or groups take matters into their own hands and we see a resurgence of organized crime. Given the amount of money available to be made by major players whose products possess a technological advantage I'd tend to expect the latter.
I really don't like a scenario where the likes of Nvidia and Intel are strongly incentivized to fund the mob.
It's a huge mistake to assume that no one will step up to the plate to do illegal and potentially outright evil things if there's a large monetary incentive involved. Either a sufficiently low friction legal avenue is provided or society is stuck cleaning up the mess that's left. The fallout of the war on drugs is a prime example of this principle in action.
I never thought it was unfair to artists for others to look at their work and imitate it. That seems to me to be what artists have been doing since the second caveman looked at a hand painting on a cave wall and thought, ‘huh, that’s pretty neat! I’d like to try my hand at that!’
For a human it took a lot of practice and a lot of time and effort. But now it takes practically no time or effort at all.
Copyright is meant to secure distribution of works you create. It's not a tool to stop people from creating art because it looks like your art. That has been a thing for centuries, we even categorize art by it's style. Imagine anime was had to adhere to a copyright interpretation of "it's my style!".
But do you not for a second think that the current way the laws and rules are set are because of how hard and time consuming it was to replicate work?
Just because "that's how it's always been" doesn't mean it's acceptable to keep it that way when the means to perform the action have so drastically changed.
When a machine can do something there is not generally a (collectively beneficial) reason to protect the individual that competes with it. Backhoes weren't regulated in order to protect ditch diggers.
I don’t see any meaningful difference at all between the system of a human, a computer and a corpus of images producing new images, and the system of a human, a paintbrush, an easel, a canvas and a corpus of images producing new images. Emphasis on the new — copying is still copying, and still controlled by copyrights.
Those people and effort aren't at all tied to the people who are making and using the art.
In the past every individual person would have to individually study art and some style and practice for years of their life to be able to replicate it really well. And for each piece of artwork it could take them days to make 1 single piece.
I would argue that this is why it wasn't really problematic to copy someone's work or style. Because the individual time and effort per person to even do that was so high.
But now that time and effort for an individual is next to nothing.
I think in reality, it is probably too late for that, because the internet is now polluted with AI generated images which would be consumed by any "ethical" model anyway.
In other words I think it would suck up a lot of money over a few years and then we would arrive back pretty much where we are now.
The difference between that, and a person just entering a prompt to create some drawing in some style.
The model looked at orders of magnitude more examples of artwork than a single human could look at and study in a lifetime.
To me there is a clear difference here.
I am merely saying that perhaps the rules should change due to the drastic change in time and effort required to do the work.
Sometimes technology changes and what was nearly impossible in the past becomes trivial.
Nobody is saying let’s not have these new efficient tools. All people are saying is let’s make protections and considerations etc for the original artists and their work that’s being used for training and for when the model draws from it to replicate the style that they developed.
The word "meaningful" here is a cheap hedging maneuver, and if you don't see a meaningful difference (whatever that means), that's on you.
I don't.
>For a human it took a lot of practice and a lot of time and effort. But now it takes practically no time or effort at all.
So effort is what makes it ok?
And why is this not a good thing?
Would you invest that time and money if patent protection did not exist? Probably not, because your competition will copy your work and bankrupt you.
At any point, society could opt to eliminate patent protection and make all existing inventions public domain, at the cost of losing future inventions. But instead we settled for 20 years.
This concern did not previously apply to art styles, because they took nearly as much skill to copy as to originate. But now it does, and with no protections, we can expect nobody to put in the work of being the next Studio Ghibli. The styles we have are all we will have, but we can mass produce them.
The original artist used to be protected by the fact that it took so much effort to copy or reproduce their original work and style at a high quality, so it rarely happened at a scale to directly impact them.
But now it’s so easy and effortless that anyone can do it in mass and that now impacts the original artist greatly.
Edit: the key words here are “company” and “reselling”
Copying is controlled by copyrights. And imitation isn’t controlled by anything.
As for a company: a company is just a group of people acting together.
Or maybe we won't. It's a choice for society to make, to balance how much we need to protect the incentives to create something new vs protect the ease of copying.
#2, yes, it’s a group of people who came together to build an algorithm that learns to extract features learned from images made by other people in order to generate images somewhere between these images in a high dimensional space. They sell these images and give no credit or cash to the images being “interpolated” between. Notice this doesn’t extend to open source, it’s the commercial aspect that represents theft.
The reality is that laws are meant to be interpreted not by their letter but their spirit. The AI can’t exist without the hard work its trained on, and the outputs often resemble the inputs in a manner that approaches copying, so selling those outputs without compensation for the artists in the training set should be illegal. It won’t be, but it should.
The purpose of the model isn't to make exact reproductions. It's like saying you can use the internet for copyright infringement. You can, but it's the user who chooses the use, so is that on AT&T and Microsoft or is it on the users doing the infringement?
> They sell these images and give no credit or cash to the images being “interpolated” between.
A big part of the problem is that machines aren't qualified to be judges.
Suppose the image you request is Gollum but instead of the One Ring he wants PewDiePie. Obviously this is using a character from the LOTR films by Warner Bros. If you're PewDiePie and you want this image to use in an ad for your channel, you might be in trouble.
But Warner Bros. got into a scandal for paying YouTubers to promote Promote Middle Earth: Shadow of Mordor without disclosing the payments. If you're creating the image to criticize the company's behavior, it's likely fair use.
The service has no way to tell why you want the image, so what is it supposed to do? A law that requires them to deny you in the second case is restricting a right of the public. But it's the same image.
Meanwhile in the first case you don't really need the company generating the image to do anything because Warner Bros. could then go after PewDiePie for using the character in commercial advertising without permission.
> Notice this doesn’t extend to open source, it’s the commercial aspect that represents theft.
It's also not really clear how this works. For example, Stable Diffusion is published. You can run it locally. If you buy a GPU from Nvidia or AMD in order to do that, is that now commercial use? Is the GPU manufacturer in trouble? What if you pay a cloud provider like AWS to use one of their GPUs to do it? You can also pay for the cloud service from Stability AI, the makers of Stable Diffusion. Is it different in that case than the others? How?
I think that a comparison that could help to elucidate the problem here is to a search engine. Like with imagegen, an image search is using infrastructure+algorithm to return the closest match(es) to a textual input over some particular space (whether the space of indexed images or the latent space of the model). Immediately, however, there are qualitative differences. A search company, as an entity, doesn’t in any way take credit for the work; it bills itself as, and operates as, a mechanism to connect the user to others’ work, and in the service of this goal it provides the most attribution it’s reasonably able to provide given technical limitations (a url).
For me this is the difference. Image gen companies, at least all that I’m aware of, position themselves more as a kind of pseudo-artist that you can commission. They provide no means of attribution, rather, they deliberately obfuscate the source material being searched over. Whether you are willing to equate the generation process to a kind of search for legal purposes is really the core disagreement here, and beyond an intuition for it not something I feel I can prove.
So what’s the solution, what’s a business model I’d find less contentious? If an AI company developed a means to, for example, associate activation patterns to an index of source material, (or hell, just provided an effective similarity search between output and training data) as a sort of good-faith attribution scheme, made visible the training set used, and was upfront in marketing about its dependence on the source material, I’d struggle to have the same issues with it. It would be leagues ahead of current companies in the ethical department. To be clear though, I’m not a lawyer. I can’t say how image gen fits into the current legal scheme, or whether it does or doesn’t. My argument is an ethical one; I think that the unethical behavior of the for-profit imagegen companies should be hampered by legality, through new laws if necessary. I feel like this should answer your other questions as well but let me know if I missed something.
Speak for yourself, there was no consensus online. There are plenty of us that think that dramatically expanding the power of copyright would be a huge mistake that would primarily benefit larger companies and do little to protect or fund small artists.
The status quo also primarily benefits larger companies, and does little (exactly nothing, if we're being earnest) to protect or fund small artists.
It's reasonable to hold both opinions that: 1) artists aren't being compensated, even though their work is being used by these tools, and 2) massive expansion of copyright isn't the appropriate response to 1).
No we didn't agree with that.
None of the cases against AI companies have been decided afaik. There's a ton of ongoing litigation.
> but doesn’t matter much while investors’ money is so plentiful.
More and more people are realizing how wasteful this stuff is every day.
It seemed a fact of life that companies will just abuse your personal data to their liking and can do what they want with information they collect about you because "if it's free, you're the product" (and even if you paid for it, "you should know better" etc). Then GDPR and its international derivatives came along and changed that.
It seemd a fact of life that companies that technically don't have an actual market monopoly can do whatever they want within their vertically integrated walled gardens because competitors can just create their own vertically integrated walled gardens to compete with them and the rules for markets don't apply to walled gardens. Then the DSA and DMA came along and changed that.
I don't see why legislation can't change this, too. Of course just with the GDPR, DSA and DMA we'll hear from libertarians, megacorps and astroturf movements how unfair it all is to mom & pop startups and how it's going to ruin the economy but I think given the angle grider the US is currently taking to its own economy (and by extension the global economy because we're all connected), I think that's no longer a valid argument in politics.
What framework can we use to decide if something is fair or not?
Style is not something that should be copyrighted. I can pain in the style of X painter, I can write in the style of Y writer, I can compose music in the style of Z composer.
Everything has a style. Dressing yourself has a style. Speaking has a style. Even writing mathematical proofs can have a style.
Copying another person's style might reflect poor judgement, bad taste and lack of originality but it shouldn't be illegal.
And anyone in the business of art should have much more than a style. He should have original ideas, a vision a way to tell stories, a way to make people ask themselves questions.
A style is merely a tool. If all someone has is a style, then good luck!
4o is the first image generation model that feels genuinely useful not just for pretty things. It can produce comics, app designs, UI mockups, storyboards, marketing assets, and so on. I saw someone make a multi-panel comic with it with consistent characters. Obviously, it's not perfect. But just getting there 90% is a game changer.
As I've argued in the past, I think copyright should last maybe five years: in this modern era, monetizing your work doesn't (usually) have to take more than a short time. I'd happily concede to some sort of renewal process to extend that period, especially if some monetization method is in process. Or some sort of mechanical rights process to replace the "public domain" phase early on. Or something -- I haven't thought about it that deeply.
So thinking about that in this process: everyone is "ghiblifying" things. Studio Ghibli has been around for very nearly 40 years, and their "style" was well established over 35 years ago. To me, that (should) make(s) it fair game.
The underlying assumption, I think, is that all the "starving" artists are being ripped off, but are they? Let's consider the numbers -- there are a handful of large-scale artists whose work is obviously replicable: Ghibli, the Simpsons, Pixar, etc. None of them is going hungry because a machine model can render a prom pic in their style. Then you get the other 99.999% of artists, all of whose work went into the model. They will be hurt, but not specifically because their style has been ingested and people want to replicate their style.
Rather, they will be hurt because no one knows their style, nor cares about it; people just want to be able to say e.g. "Make a charcoal illustration of me in this photo, but make me sitting on a horse in the mountains."
It's very much like the arguments about piracy in the past: 99.99% of people were never going to pay an artist to create that charcoal sketch. The 0.01% who might are arguably causing harm to the artist(s) by not using them to create that thing, but the rest were never going to pay for it in the first place.
All to say it's complicated, and obviously things are changing dramatically, but it's difficult to make the argument that "artists need to be compensated for their work being used to train the model" without both a reasonable plan for how that might be done, and a better-supported argument for why.
The arguments about wanting copyright to be life+70 have always felt entitled, to me. Making claims about things for their kids to inherit, when the median person doesn't have the option to build up much of an inheritance anyway, and 70 years isn't just the next generation but the next 2.5 generations.
I don't know the exact duration of copyright that makes sense, the world changes too much and different media behave differently. Feels like nobody should have the right to block remakes of C64 games on copyright grounds, but I wouldn't necessarily say that about books published in the same year.
From what I've seen about the distribution of specifically book sales, where even the top-100 best sellers often don't make enough to justify the time involved, I think that one of the biggest problems with the economics of the arts is a mixture of (1) the low cost of reproduction, and (2) all the other artists.
For the former: There were political campaigns a century ago, warning about the loss of culture when cinemas shifted from live bands to recorded music[0]; Today, if I were so inclined, I can for a pittance listen to any of (I'm told) 100 million musical performances, watch any of 1.24 million movies or TV shows. Even before GenAI, there was a seemingly endless quantity of graphical art.
For the latter: For every new book by a current living author such as Charlie Stross (who is on here sometimes), my limited time is also spread between that and the huge back-catalogue of old classics like the complete works of Conan Doyle, Larry Niven, or Terry Pratchett.
[0] https://www.smithsonianmag.com/history/musicians-wage-war-ag...
I think allowing it to be fair game would have destroyed something quite beautiful that I’ve watched evolve across 40 years and which I was hoping to see the natural conclusion of without him being bothered by the AI-fication of his work.
As a specific example — _A Game of Thrones_ was released in 1996. It picked up awards early on but only became a NYT best seller in 2011, just before the TV show aired.
It would feel harsh for an author to loose all their copyright because their work was a "slow burn" and 5 years have elapsed but they've made little to no money on it.
https://en.wikipedia.org/wiki/Hollywood_accounting
No, no metrics that can be gamed.
Well broadly that's because most arguments about copyright(length/scope) are made against corporations attacking individual artists and arguments about copyright(AI/scope) are made against corporations attacking individual artists.
You don't just buy art for the aethstetic, you buy it for a lot of reasons and AI doesn't give any of the same satisfaction.
if people are paying, then they aren't "overcharging"
And, I don't know, depth of penetration of a needle in flesh and sanitation don't strike me as minor things to get right.
In my experience people tend to underestimate or downplay how difficult something will be or how complex it is. This happens in people who know only a little about something, but also in people who are highly experienced because it becomes normal and easy for them and they can quickly evaluate a situation and know which considerations don't apply.
So... after spending hundreds, if not thousands, of hours learning a skill?
I got a tattoo back in the day and specifically went to one the guys in my platoon said was good due to him being featured in magazines or whatever. It's kind of an important thing to get right on the first, not 20th, attempt IMHO.
By definition, almost half of all $ARTISTS are worse than the median. Should that half not get paid for their time?
They can always put in more hours and become better. I cant imagine they have a lot of paying customers anyway.
*looks at 3D printer on desk that can apparently handle ceramic filaments, thinks about all the mass-produced ceramics sold in supermarkets*
I'm not even mad. We do a terrible job in our society of valuing artists and creative people generally and in explaining the value of intangible things, especially something like good will. People have been misappropriating fonts and clipart and screenshots in presentations and posters and whatnot, duplicating clever branding ideas and the creative efforts of others, and so on for _decades_ if not longer, all without ill intent. It's something we need to fix and never will. But when that becomes a channel for another to directly profit, it begins to venture out of harmlessness.
If Ghibli feels like they are getting screwed, they could've taken this opportunity to promote themselves, is the parent's point. If I were in their marketing dept I would have been screaming "guys, non-weeaboo people are seeing our name in the news, let's fucking capitalize!" When has Ghibli ever trended? Set up some screenings or stream Spirited Away on their site for a couple weeks or somethin. If they want to win hearts and minds, that's what you have to do. As of now, it's already out of the MSM news cycle and forgotten.
funny how people who say this kind of stuff are never content creators (in the monetization sense).
I have a number of public repos and I have benefitted greatly from other public repos. I hope LLMs made some use of my code.
I wrote blogs for years without any monetization. I hope my ideas influenced someone and would be happy if they made some impact on the reasoning of LLMs.
I'm aware of patent trolls and know people with personal experience with them.
So I generate a lot more content that the typical person and I am still in favor of much looser IP rights as I think they have gone overboard and the net benefit for me, a content creator, is much greater having access to the work of others and being able to use tools like LLMs trained on their work.
I can't imagine a similar way for an artist to distribute their work while protecting their interests.
And, as a content creator, I practice what I preach - at least when it comes to my poetry: https://rikverse2020.rikweb.org.uk/blog/copyrights
I like the practical angle. Any formula that requires monitoring what everyone is doing is unworthy of consideration. Appeal to tradition should not apply.
There's this new expectation that you should just be able to post some music on Spotify or set up an Etsy shop and get significant passive income. It has never ever worked that way and I feel this new expectation comes from the hustle/influencer types selling it.
Most art is crap and most music isn't worth listening to. In the modern age, it's easy for anyone to be a band or artist and the ability to do this has led to a ton of choice, the market is absolutely flooded. If anyone can do a thing (for very loose values of "do") it's inherently worth less. Only the very best make it and it will always be that way.
Source: made a living as a musician for 20 years. The ones who make it are relentlessly marketing themselves in person. You have to leave the house, be a part of a scene, and be constantly looking for opportunities. No one comes knocking on your door, you must drive your product and make yourself stand out in some way. You make money on merch and gigs, and it's always been that way.
This is all to say that copyright law only affects the top 0.1%. The avg struggling artist will never have to worry about any of this. It's like Bob the mechanic worrying about inheritance taxes. Pipe dream at best.
My example is extreme to the absurd, so how about we go with
>It's difficult to get a man to understand something when his salary depends on not understanding it.
2007: Copyright is garbage and must be abolished (so I can get music/movies free)
2025: Copyright needs to be strengthened (so my artistic abilities retain value)
There is nothing other than Egoism.
"Copyright for thee but not for me" is the worst of all worlds.
If we imagine for a moment that "copyright" is something that works in the interests of a creator, than 5 years is nothing.
A painting can sit fifteen years before it gets to an exhibition with sufficient turn-over and media coverage to draw attention to.
A music album can be released with a shitty label and no support, years later be taken by a more competent one and start selling.
We're living in a world where worth art is constantly flying under the radars so limiting potential even more isn't helpful.
No wonder sama and Trump are so cozy. They both see the same legacy.
Unfortunately I think the answer to this question is a resounding “no”.
The time for thoughtful shaping was a few years ago. It feels like we’re hurtling toward a future where instead we’ll be left picking up the pieces and assessing the damage.
These tools are impressive and will undoubtedly unlock new possibilities for existing artists and for people who are otherwise unable to create art.
But I think it’s going to be a rough ride, and whatever new equilibrium we reach will be the result of much turmoil.
Employment for artists won’t disappear, but certain segments of the market will just use AI because it’s faster, cheaper, and doesn’t require time consuming iterations and communication of vision. The results will be “good enough” for many.
I say this as someone who has found these tools incredibly helpful for thinking. I have aphantasia, and my ability to visualize via AI is pretty remarkable. But I can’t bring myself to actually publish these visualizations. A growing number of blogs and YouTube channels don’t share these qualms and every time I encounter them in the wild I feel an “ick”. It’ll be interesting to see if more people develop this feeling.
Honestly visual media just seems to be the start. In the past two years we've seen about as much robotics progress as the last 20. If this momentum keeps up then we're not just talking about artists that are going to have issues.
Maybe the progress you’re describing has escaped me because of the sheer speed this is all unfolding, but it feels like all I’ve heard is lots of noise, while AI companies continue to hammer hosted resources across the Internet to build their next model, the US government continues to claim they’ll use AI to solve problems of waste and fraud, companies like Shopify claim they won’t hire anyone unless it can be proven that AI cannot do the job, and an increasing % of the content I encounter is AI slop.
Maybe this is all necessary for a proper backlash to form, and I definitely want to become more aware of the positives anywhere I can find them. I’m not an AI doomer, but haven’t yet found the optimism you describe.
This is very easy to lose sight of, especially given rapid advancements, but it's important. I think certain companies like Anthropic definitely have safety approaches that I agree with more, being more thoughtful and having clearly-outlined scaling policies (such as the latest Responsible Scaling Policy effective March 31 of this year) versus more vague safety promises such as from companies such as OpenAI and Google. Websites such as https://www.freethink.com have wonderful essays espousing techno-humanism that ultimately gives compelling arguments on how AI will be a progressively beneficial force on humanity, rather than detract from it.
Yes, there WILL be growing pains - as is what happened with the internet and the World Wide Web. But, I am confident that we will adapt. There is no better time in history to be living in than right now.
https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_...
(nice URL btw)
The room, the door, the ceiling are all of a scale to fit many sizes of elephants.
Imagine a ask AI to show me a sewer cap that's less than a foot wide (or whatever, I dunno, watch TMNT right now). And it does, just by showing a sewer cap that looks photorealistic and a ruler, that has markings from one end to the other that only go up to 8 inches. That doesn't mean sewer caps come in that size, it just means you can produce a rendered image to fit what you asked for.
"in multimodal image generation, images are created in the same way that LLMs create text, a token at a time"
Is there some way to visualise these "image tokens", in the same way I can view tokenized text?
As such, the process is remarkably similar to old fixed-font ASCII art. It's just that modern AIs have a larger alphabet and, thus, more character shapes to choose from.
Exactly. People just confidently make things up. There are many possible ways, and without details, "native generation" is just a marketing buzzword without clear definition. It's a proprietary system, there is no code release, there is no publication. We simply don't know how exactly it's done.
It's probably an implementation of VAR (https://arxiv.org/abs/2404.02905) - autoregressive image generation with a small twist. Rather than predict every token at the target resolution directly, start with predicting it at a small resolution, cranking it higher and higher until the desired resolution.
I like to look at how far we've come since the early days of Stable Diffusion. It was fascinating to play with it back then, but it quickly became apparent that it was "generic" and not suited for "real work" because it lacked consistency, text capabilities, fingers! and so on... Looking at these results now, I'm amazed at the quality, consistency and ease of use. Gone are the days of doing alchemy on words and adding a bunch of "in the style of Rutkovsky, golden hour, hd, 4k, pretty please ..." at the end of prompts.
I like the book, but there are quite a few scenes which are quite hard to visualize and make sense. An image generator that can follow that language and detail will be amazing. Even more awesome will be if it remains consistent in follow ups.
https://chatgpt.com/share/67f5d652-f7f4-8013-b2f2-3c997ea513...
Books are fundamentally a collaborative artform between the author and the reader. The author provides the blueprint, but it's up to the reader to construct the scene in their own head. And every reader is going to have slightly different interpretations based on how they imagine the events of a book. This act of imagination and re-interpertation is one of the things I love about reading books.
Having a computer do the visualization for you completely destroys what makes books engaging and interesting. If you don't want to visualize the book yourself, I have to wonder why the hell you're reading a book in the first place.
If you need that visual component, just watch a movie or read a comic book or something. This isn't a slight against movies or comics! They're fantastic mediums and are able to utilize their visual element to communicate ideas in ways that books can struggle with. And these visuals will form a much more cohesive artistic vision then whatever an AI outputs, since they're an integrated and intentional part of the work.
For this book in particular, I read the comic version and I didn't like the visuals very much. I have a different idea of babel fish. Vogons look different. I would love to see the visual that's in my head on paper.
I’m not sure what that says about either of us, but I would say that your definitive “quite hard to visualise” statement is very much subjective.
There is the scene when they see themselves on the beach on first rescue by the ship. That was hard to grasp. Or the insides of the ship itself, the bridge, the panels etc. Also that black ship they stole.
But may be its just me having a hard time with these concepts.
It's not just about scene being difficult to visualize, even if I can see them in my head, I want to see them on paper too because those thing excite me.
You can see other people's interpretation of Zaphod's two heads by watching the BBC HHGTTG show (Mark Wing-Davey) or the movie (Sam Rockwell), among other renditions, which offer completely different interpretations, none of them canonical (not the least of which is because there was no canonical version of HHGTTG according to DA). I'm sure there are multitudes of fan art for HHGTTG on deviantart. Having AI generate an image doesn't offer any more "official" visualization.
Zaphod's second head is mentioned just as much is warranted. If a character has a limp or a crazy haircut it is not mentioned every time, because it has nothing to do with what is going on. And the book mentions that one head is often distracted/asleep, so it sounds like you do have a good visual of what his two heads are like.
While I understand that people think differently and some people are more visual thinkers, a good portion of the concepts expressed through writing are meant to be mindfucks that are difficult to express visually. A picture may be worth a thousand words, but the meat of writing is usually not the visual representation of its concepts. That's a great thing about writing: you can fill in the visuals yourself and it's fodder for fans to discuss.
(BTW, Hotblack Desiato's ship would just be black. Your eyes couldn't focus on it. Even the controls were black labels on a black background. There is nothing here to visualize other than, well, blackness).
These days we have paint that black, though we can't reproduce the effect on the monitor.
Those superblacks really do mess with your mind. It's like a cutout of the void.
If someone can show me exactly what I am thinking of, won't that be amazing.
I suppose it would be amazing if someone could read minds, but is that what you're asking for? In an earlier comment, you opened with:
> I am waiting for when I could provide these a scene snippet from "Hitchhiker's Guide To Galaxy" (or any book) and it could draw that for me.
This is asking for an illustrator, not showing you what you're thinking. The illustrator, even if it is a machine, will show you their interpretation.
I don’t know if the “layout” of the heads is mentioned or not in the books - I’d have to go back and check - but it’s often quite jarring when a book becomes a movie and doesn’t match my inner vision (and how incredibly unthoughtful of them, to boot).
https://sora.com/g/gen_01jrbq91wtefjtpb8ceajdh9mt
and then iterate around other combinations to see if it's generalized or not.
It's "just" a much bigger and much better trained model. Which is a quality on its own, absolutely no doubt about that. Fundamentally the issue is still there though, just less prominent. Which kind of makes sense - imagine the prompt "not green", what even is that? It's likely slightly out of distribution and requires representing a more complex abstraction, so the accuracy will necessarily be worse than stating the range of colors directly. The result might be accurate, until the model is confused/misdirected by something else, and suddenly it's not.
I think in the end none of the architectural differences will matter beyond the scaling. What will matter a lot more is data diversity and training quality.
Here is an example with a bunch of negations: https://i.imgur.com/P8G5ICs.png
Feed is in quotes because my feed seems to be 90% suggested posts.
One thing i'd add is that generating the tokens at the target resolution from the start is no longer the only approach to autoregressive image generation.
Rather than predicting each patch at the target resolution right away, it starts with the image (as patches) at a very small resolution and increasingly scales up. Paper here - https://arxiv.org/abs/2404.02905
My understanding is it’s a meta-LLM approach, using multiple models and having them interact. I feel like it’s also evidence that OpenAI is not seriously pursuing AGI (just my opinion, I know there’s some on here who would aggressively disagree), but rather market use cases. It feels like an acceptance that any given model, at least now, has its own limitations but can get more useful in combination.
Wonderful to be alive for these step changes in human capability.
Gave it another chance now, explicitly calling out the numbers. Well, they are improved but not sure how useful this result is (the spacing between numbers is a little off and there's still some curious counting going on. Maybe it kind of looks like the numbers are pasted in after the fact?
https://chatgpt.com/share/67f4fa33-70dc-8012-8e1e-2dea563d3d...
I used GPT-4o for some image editing (adding or removing things) to an image of a person and they distort the look of the people after each edit but (Gemini Flash + image out) did much better.
The main problem is there is little control. For example I asked to add a helicopter to an image in a ski resort but then it seems cumbersome for me to have to write a full paragraph to describe where exactly I want this helicopter to be rather than if I could just do it by dragging things with a mouse.
Which isn't a small thing, humour is an advanced soft skill.
https://www.reddit.com/r/dalle2/s/khb5XuNFdl
There’s probably some sort of connection to ChatGPT in there. But, I don’t know enough about how it works.
I wouldn't call it a good metric, though.
Basically, the user's image prompt is converted to a several prompts to generate parts of the final image in layers which are combined. The layers are still available so that edits can cleanly update one section without affecting the others.
To me, this kind of image generation isn't very interesting for creating final products, but is extremely useful for communicating design intent to other people when collaborating on large creative projects. Previously I used crude "ms paint" sketches for this, which was much more tedious and less effective.
A: Your face is pressed up against the ceiling!
Midjourney is similar with text prompts. But, with image prompts it is able to understand content separately from style. You can give it a photo of two people and it can return many images of recognizable approximations of those people in different poses.
SD can only start from pixels, blur and deblur those pixels in place.
MJ image prompts probably works via image-to-tokens added on to your text-to-tokens-to-image.
We get Stable Diffusion V1.5 and SDXL and what does the community go do with it? Lmao see civit.ai and it's literal hundreds of thousands of NSFW loras. The most popular model today on that website is the NSFW anime version of SDXL, called "Pony Diffusion" (I'm literally not making this up. A bunch of Bronies made this model!)
Imagine that an open source image generator which does tokens autoregressively like this at this quality is released.
The world is simply not ready for the amount of horny stuff that is going to be produced (especially without consent). It appears that the male libido really is the reason for most bad things in the world. We are truly the "villains of history".
In other words, people who care about money and only money are pushing for these tools because they're convinced they'll reduce labor costs and somehow also improve the resulting product, while engineers and creative professionals who have these tools foisted upon them by unimaginative business people continue to insist that the tools are a solution in search of a problem, that they're stochastic parrots and plagiarism automata that bypass all of the important parts of engineering and creativity and make the absolutely, breathtakingly idiotic mistake of supposing it's possible to leap to a finished product without all the work and problem solving involved in getting there.
> The line between human and AI creation will continue to blur
This is utter nonsense, and hype-man prognosticators in the tech world like the author of the article turn out pretty much 100% of the time to be either grifters or saps who have fallen for the grifters' nonsense.