https://en.wikipedia.org/wiki/List_of_rotoscoped_works#Anima...
The experienced animator and inbetweeners knew how to produce smooth line motion, and the live action was used for lifelike pose, movement, etc. It wasn’t really tracing.
There’s examples of this in the Disney animation books, the finished animation looks very different from the live actors, but with the same movement.
Animation is a great art and it takes a lot of skill to make things look the way they ought to for whatever it is you are trying to achieve.
Most animators don't like the "digital makeup" comparison (because it's often used in a way which feels marginalizing to their work on mocap-heavy shows), but if you interpret it in the sense that makeup makes people look the way they are "supposed to" I think it's a good model for understanding why rotoscope and motion capture don't yet succeed without them.
When animator Ken Arto was on the Trash Taste podcast he mentioned how Disney had the resources to perfect the animation, while in Japan they had to achieve more with less.
This basically shifts the "what is good animation" discussion in ways that are not as clear from looking at the stats.
[0] https://blog.nestful.app/p/ways-to-use-nestful-outlining-ani...
It's also no wonder why such people get disconnected from some realities on the ground. Sure on paper people do want higher quality things but they don't even know what those are. Most people have low-brow tastes; they'd take a cheaper and well-marketed thing over a 1% improvement.
Japan didn't need to compete on the same ladder for success, it needed to mix various elements of what they're good at to achieve it's own success.
Interestingly that does not happen in the opposite direction. When "reducing" certain stats on real footage (which is what live-action anime should do[0]) the uncanny valley is skipped. Maybe it's harder to fall into when going backwards? More research is needed.
BTW, I love your books
This shouldn't be glossed over and a proper consideration of the error metric here is key to storing quality animation with fewer bits, lower bandwidth and higher performance.
For example, images are often generated with jpeg artifacts in regions but not globally.
Watermarks are also reproduced.
Some generated images have artifacts from CCD cameras
https://www.eso.org/~ohainaut/ccd/CCD_artifacts.html
Images generated from Google Street View data would likely contain features specific to the cars/cameras used in each country
> This data is sampled at 120 Hz, with finger and toe motions
But when I watch the videos they look like the dancer had palsy affecting their hands or were wearing astronaut gloves, because the fingers barely move for the most part.
One move to The Clone Wars and the CGI moves are mechanic. Maybe the way to go about animation is not on the eye of the beholder but on careful comparison of analog vs digital renderings: Film a human running on analog and pair it pixel by pixel with the digital cgi counterpart.
Human color perception is almost entirely comparative - we see something as Blue because within the context of the other objects in a scene and the perceived lighting, the color an object would be that looked the way the object in the scene does is Blue (this is the blue dress phenomenon) - and so noise in images is easy for us to ignore. Similarly, audio and especially speech perception is also very strongly contextually dependent (as attested by the McGurk effect), so we can also deal with a lot of noise or imprecision - in other words, generative guesswork.
Motion, on the other hand, and especially human motion, is something we're exquisitely attentive to - think of how many horror movies convey a character's 'off-ness' by subtle variations in how they move. In this case, the diffusion model's tendency towards guesswork is much, much less easily ignored - our brains are paying tight attention to subtle variations, and anything weird alarms us.
A constant part of the conversation around LLMs, etc. is exactly this level of detail-mindedness (or, the "hallucinations" conversation), and I think that's basically where you're going to land with things like this - where you need actual genuine precision, where there's some proof point on whether or not something is accurate, the generative models are going to be a harder fit, whereas areas where you can get by with "pretty good", they'll be transformative.
(I've said it elsewhere here, but my rule of thumb for the LLMs and generative models is that if a mediocre answer fast moves the needle - basically, if there's more value in speed than precision - the LLMs are a good fit. If not, they're not.)
edit: Claude is thinking MP3 could work directly: pack 180Hz animation channels into a higher frequency audio signal with some scheme like Frequency Division / Time Division Multiplexing, or Amplitude Modulation. Boom, high compression with commonplace hardware support.
so; if the sampling theorem applies; having 2x the maximum movement „frequency“ should be enough to perfectly recreate them, as long as you „filter out“ any higher frequencies when playing back the animation by using something like fft upscaling (re-sampling) instead of linear or bezier interpolation.
(having written this, I realize that‘s probably what everyone is doing.)
Can anyone think of a system with better time-to-first-frame that achieves good compression?
I guess to restate my curiosity: are things like Animation Pose Compression in Unity or equivalents in other engines remotely as good as audio techniques with hardware support? The main work on this seems to be here and I didn't see any references to audio codecs in the issue history fwiw. https://github.com/nfrechette/acl
Animation and motion are two different things—related, but definitely not the same. They don't rely on the same principles and they don't capture the same data.
Most people use the terms interchangeably, probably because the tools to process key frames are USUALLY the same.
Animation frames aren't regular the way mo-cap is. Instead, they are designed to furnish the eye (persistence of vision) with images that, in sequence, produce a sense of crisp motion to the viewer.
It's a subtle distinction, but the result is wildly different. In animation, the ACTUAL POSES matter a great deal. In mo-cap, they don't matter at all, it's all about regular sampling and then you just render (in 3D) what you want.
Video game cut scenes are what more-or-less raw "mo-cap" looks like if you're curious.
Curation is something we intrinsically favor over engagement algorithms. Noisy is easy to quantify, but greatness is not. Greatness might have a lag in engagement metrics while folks read or watch the material. It might provoke consideration, instead of reaction.
Often we need seasons of production in order to calibrate our selection criteria, and hopefully this season of booming generation leads to a very rich new opportunity to curate great things to elevate from the noise.
By definition 99% of the content produced has to be in the bottom 99 percentiles, in any given year.
Even if the entire world decided everything must be curated, that would just mean the vast vast majority of curators have not-great taste.
Whereas in a future world where 99% of it is driven by algorithms, that would mean the vast majority of curators have ‘great’ taste.
But this seems entirely orthogonal.
Namely, anything generating music / video / images - tweaking the output is not workable.
Some notable exceptions are when you need stock art for a blog post (no need for creative control), Adobe's recolorization tool (lots of control built in), and a couple more things here and there.
I don't know how it is for 3D assets or rigged model animation (as per the article), never worked with them. I'd be curious to hear about successful applications, maybe there's a pattern.
2D art has a lot of strong tooling though. If you’re actually trying to use AI art tooling, you won’t be just dropping a prompt and hoping for the best. You will be using a workflow graph and carefully iterating on the same image with controlled seeds and then specific areas for inpainting.
We are at an awkward inflection point where we have great tooling for the last generation of models like SDXL, but haven’t really made them ready for the current gen of models (Flux) which are substantially better. But it’s basically an inevitability on the order of months.
I've found this to be observable in practice - I follow hundreds of artists who I could reliably name by seeing a new example of their work, even if they're only amateurs, but I find that AI art just blurs together into a samey mush with nothing to distinguish the person at the wheel from anyone else using the same models. The tool speaks much louder than the person supposedly directing it, which isn't the case with say Photoshop, Clip Studio or Blender.
Art made by unskilled randos is always going to blur together. But the question I feel we’re discussing here is whether a dedicated artist can use them for production grade content. And the answer is yes.
It's just a matter of time until some big IP holder makes "productionizable" generative art, no? "Tweaking the output" is just an opinion, and people already ship tons of AAA art with flaws that lacked budget to tweak. How is this going to be any different?
What it really lacks is domain knowledge. Current imagen is done by ML nerds, not artists, and they are simply unaware of what needs to be done to make it useful in the industry, and what to optimize for. I expected big animation studios to pick up the tech like they did with 3D CGI in the 90s, but they seem to be pretty stagnant nowadays, even besides the animosity and the weird culture war surrounding this space.
In other words, it's not productized because nobody productized it, not because it's impossible.
It's very cool that we have a technology that can generate video, but what's cool is the tech, not the video. It doesn't matter if it's a man eating spaghetti or a woman walking in front of dozens of reflections. The tech is cool, the video is not. It could be ANY video and just the fact AI can generate is cool. But nobody likes a video that is generated by AI.
A very cool technology to produce products that nobody wants.
No one really cares about a tech demo, but if generative tools help you make a cool music video to an awesome song? People will want it.
Well, as long as they aren't put off by a regressive stigma against new tool at least.
See: a grandmother’s food vs. the industrial equivalent
If you dislike it without even seeing it, that would indicate the problem isn't with the video...
The only good AI is AI out of my sight.
TBF - have you looked at a digital photo made in the last decade? Likely had significant 'AI' processing applied to it. That's why I call it a regressive pattern to dislike anything with a new label attached - it minimizes at best and often flat out ignores the very real work very real artists put in to leverage the new tools.
Output of current image generators are trash. It's unsalvageable. That's the problem, not "regressive pattern".
New tools aren't inherently inferior, they open up new opportunities.
creative power without control is like a rocket with no navigation—sure, you'll launch, but who knows where you'll crash!
You may feel different if it’s, say, art assets in your new favorite video game, frames of a show, or supplementary art assets in some sort of media.
A lot of people will not notice the missing reflections and because of this our gatekeepers to quality will disappear.
A Bluegrass song about how much fun it is to punch holes in drywall like a karate master.
A post-punk/hardcore song about the taste of the mud and rocks at the bottom of a mountain stream in the newly formed mountains of Oklahoma.
A hair band power ballad about white dad sneakers.
But for "serious" songs, the end result sounds like generic muzak you might hear in the background at Wal-Mart.
It is technically interesting, and a lot of what it creates does have its own aesthetic appeal just because of how uncanny it can get, particularly in a photorealistic format. It's like looking at the product of an alien mind, or an alternate reality. But as an expression of actual human creative potential and directed intent I think it will always fall short of the tools we already have. They require skilled human beings who require paychecks and sustenance and sleep and toilets, and sometimes form unions, and unfortunately that's the problem AI is being deployed to solve in the hope that "extruded AI art product" is good enough to make a profit from.
The last 2 can have tremendous talent but the society at large isn’t that sensitive to the higher quality output.
Edit: Wow! they are loaded directly from the server where I assume no cdn is involved. And what's even worse they're not lazy loaded. No wonder why it cannot handle a little bit of traffic.
It's always easy to talk about "actually trying to build quality content" in the abstract. Your thing, blog post or whatever, doesn't pitch us a game. Where is your quality content?
That said, having opinions is a pitch. A16Z will maybe give you like, $10m for your "Human Generated Authentic badge" anti-AI company or whatever. Go for it dude, what are you waiting for? Sure it's a lot less than $220m for "Spatial Intelligence." But it's $10m! Just take it!
You can slap your badge onto Fortnite and try to become a household name by shipping someone else's IP. That makes sense to me. Whether you can get there without considering "engagement," I don't know.