https://en.wikipedia.org/wiki/List_of_rotoscoped_works#Anima...
The experienced animator and inbetweeners knew how to produce smooth line motion, and the live action was used for lifelike pose, movement, etc. It wasn’t really tracing.
There’s examples of this in the Disney animation books, the finished animation looks very different from the live actors, but with the same movement.
Animation is a great art and it takes a lot of skill to make things look the way they ought to for whatever it is you are trying to achieve.
Most animators don't like the "digital makeup" comparison (because it's often used in a way which feels marginalizing to their work on mocap-heavy shows), but if you interpret it in the sense that makeup makes people look the way they are "supposed to" I think it's a good model for understanding why rotoscope and motion capture don't yet succeed without them.
When animator Ken Arto was on the Trash Taste podcast he mentioned how Disney had the resources to perfect the animation, while in Japan they had to achieve more with less.
This basically shifts the "what is good animation" discussion in ways that are not as clear from looking at the stats.
[0] https://blog.nestful.app/p/ways-to-use-nestful-outlining-ani...
It's also no wonder why such people get disconnected from some realities on the ground. Sure on paper people do want higher quality things but they don't even know what those are. Most people have low-brow tastes; they'd take a cheaper and well-marketed thing over a 1% improvement.
Japan didn't need to compete on the same ladder for success, it needed to mix various elements of what they're good at to achieve it's own success.
Interestingly that does not happen in the opposite direction. When "reducing" certain stats on real footage (which is what live-action anime should do[0]) the uncanny valley is skipped. Maybe it's harder to fall into when going backwards? More research is needed.
BTW, I love your books
To contrast the above comment, video games don't let you "skip" steps these days. It's unsurprising to hear the author works at Epic Games because you get a lot less room to be experimental in that 3d real-time realm compared to any other medium like movies. When interactivity is involved, fluidity and responsiveness is key to keeping a player immersed, compared to a movie that could suddenly lower its framerate to create an ironically more engaging fight scene.
Animation and motion are two different things—related, but definitely not the same. They don't rely on the same principles and they don't capture the same data.
Most people use the terms interchangeably, probably because the tools to process key frames are USUALLY the same.
Animation frames aren't regular the way mo-cap is. Instead, they are designed to furnish the eye (persistence of vision) with images that, in sequence, produce a sense of crisp motion to the viewer.
It's a subtle distinction, but the result is wildly different. In animation, the ACTUAL POSES matter a great deal. In mo-cap, they don't matter at all, it's all about regular sampling and then you just render (in 3D) what you want.
Video game cut scenes are what more-or-less raw "mo-cap" looks like if you're curious.
This shouldn't be glossed over and a proper consideration of the error metric here is key to storing quality animation with fewer bits, lower bandwidth and higher performance.
It follows from digitizing the articulated armatures of stop motion animation which preceeded CGI. Some of the first CGI was input using metal jointed armatures with potentiometers in every joint which is a form of 'rigid' transforms.
I'm not sure what you are thinking about when you say it's not a great representation for processing animation data but I speculate that you mean it's not a great representation for how a flesh and blood creature moves and you'd be right. Advanced CGI can sometimes tackle a more thorough simulation of the musculature, complex joints, tendons, ligaments, and the physical dynamics of these all being controlled by a brain. A lot of soft body physical interactions as well. The sheer amount of processing and data needed for such a real simulation is why the cheating approximations are used for interactive simulations and games.
See https://www.wetafx.co.nz/research-and-tech/technology/tissue
For example, images are often generated with jpeg artifacts in regions but not globally.
Watermarks are also reproduced.
Some generated images have artifacts from CCD cameras
https://www.eso.org/~ohainaut/ccd/CCD_artifacts.html
Images generated from Google Street View data would likely contain features specific to the cars/cameras used in each country
That makes me wonder: if you label good data, and generate data with the good label, how much benefit do you get from also training on okay data?
> This data is sampled at 120 Hz, with finger and toe motions
But when I watch the videos they look like the dancer had palsy affecting their hands or were wearing astronaut gloves, because the fingers barely move for the most part.
> Session 2: Casual dancing ... No finger motion.
> Session 3: Vintage jazz dancing ... Fingers captured with Manus gloves, which unfortunately suffered in quality due to sensor drift during rapid motion.
> Session 4: Street dancing ... Simplified finger motion (markers on thumb, index finger, and pinky according to the OptiTrack layout).
The first video in the article does have a bit of finger motion, so I'm guessing it's from session 4. Toes also look a bit iffy and clip into the ground instead of curling at times.
One move to The Clone Wars and the CGI moves are mechanic. Maybe the way to go about animation is not on the eye of the beholder but on careful comparison of analog vs digital renderings: Film a human running on analog and pair it pixel by pixel with the digital cgi counterpart.
Human color perception is almost entirely comparative - we see something as Blue because within the context of the other objects in a scene and the perceived lighting, the color an object would be that looked the way the object in the scene does is Blue (this is the blue dress phenomenon) - and so noise in images is easy for us to ignore. Similarly, audio and especially speech perception is also very strongly contextually dependent (as attested by the McGurk effect), so we can also deal with a lot of noise or imprecision - in other words, generative guesswork.
Motion, on the other hand, and especially human motion, is something we're exquisitely attentive to - think of how many horror movies convey a character's 'off-ness' by subtle variations in how they move. In this case, the diffusion model's tendency towards guesswork is much, much less easily ignored - our brains are paying tight attention to subtle variations, and anything weird alarms us.
A constant part of the conversation around LLMs, etc. is exactly this level of detail-mindedness (or, the "hallucinations" conversation), and I think that's basically where you're going to land with things like this - where you need actual genuine precision, where there's some proof point on whether or not something is accurate, the generative models are going to be a harder fit, whereas areas where you can get by with "pretty good", they'll be transformative.
(I've said it elsewhere here, but my rule of thumb for the LLMs and generative models is that if a mediocre answer fast moves the needle - basically, if there's more value in speed than precision - the LLMs are a good fit. If not, they're not.)
edit: Claude is thinking MP3 could work directly: pack 180Hz animation channels into a higher frequency audio signal with some scheme like Frequency Division / Time Division Multiplexing, or Amplitude Modulation. Boom, high compression with commonplace hardware support.
so; if the sampling theorem applies; having 2x the maximum movement „frequency“ should be enough to perfectly recreate them, as long as you „filter out“ any higher frequencies when playing back the animation by using something like fft upscaling (re-sampling) instead of linear or bezier interpolation.
(having written this, I realize that‘s probably what everyone is doing.)
Can anyone think of a system with better time-to-first-frame that achieves good compression?
I guess to restate my curiosity: are things like Animation Pose Compression in Unity or equivalents in other engines remotely as good as audio techniques with hardware support? The main work on this seems to be here and I didn't see any references to audio codecs in the issue history fwiw. https://github.com/nfrechette/acl
>All of these datasets are NOT licensed for commercial use. They are for research use only.
I mean, I knew this already when I looked at the license (a thing any commercially-oriented dev should do on any repo) and saw CC-4 (non-derivative, non-commercial). But it's still sad that this somewhat repeats the very mantra said a few sections up:
>Almost all games and VFX companies don't share their data with the academic world (with the exception of Ubisoft who have released some very high quality animation datasets), so how can we penalize academics for not reaching a bar they don't even know exists?
But alas, This is one barrier that separates an Indie and even some AA games from a AAA game. At least the article gave tips on what to look out for if trying to prepare your own animation dataset.
Curation is something we intrinsically favor over engagement algorithms. Noisy is easy to quantify, but greatness is not. Greatness might have a lag in engagement metrics while folks read or watch the material. It might provoke consideration, instead of reaction.
Often we need seasons of production in order to calibrate our selection criteria, and hopefully this season of booming generation leads to a very rich new opportunity to curate great things to elevate from the noise.
By definition 99% of the content produced has to be in the bottom 99 percentiles, in any given year.
Even if the entire world decided everything must be curated, that would just mean the vast vast majority of curators have not-great taste.
Whereas in a future world where 99% of it is driven by algorithms, that would mean the vast majority of curators have ‘great’ taste.
But this seems entirely orthogonal.
Namely, anything generating music / video / images - tweaking the output is not workable.
Some notable exceptions are when you need stock art for a blog post (no need for creative control), Adobe's recolorization tool (lots of control built in), and a couple more things here and there.
I don't know how it is for 3D assets or rigged model animation (as per the article), never worked with them. I'd be curious to hear about successful applications, maybe there's a pattern.
It's very cool that we have a technology that can generate video, but what's cool is the tech, not the video. It doesn't matter if it's a man eating spaghetti or a woman walking in front of dozens of reflections. The tech is cool, the video is not. It could be ANY video and just the fact AI can generate is cool. But nobody likes a video that is generated by AI.
A very cool technology to produce products that nobody wants.
No one really cares about a tech demo, but if generative tools help you make a cool music video to an awesome song? People will want it.
Well, as long as they aren't put off by a regressive stigma against new tool at least.
See: a grandmother’s food vs. the industrial equivalent
One end of art is spending millions of man hours to polish this effect to fool the eye. the other side simplifies the environment and focuses more on making this new environment cohesive, which relaxes our expectations. Take your favorite 90's/early 00's 3d game and compare it to Mass Effect: Andromeda to get a feel of this.
AI is promising to do the former with the costs of the latter. And so far it's maybe halfway to Andromeda in its infancy of videos.
If you dislike it without even seeing it, that would indicate the problem isn't with the video...
The only good AI is AI out of my sight.
TBF - have you looked at a digital photo made in the last decade? Likely had significant 'AI' processing applied to it. That's why I call it a regressive pattern to dislike anything with a new label attached - it minimizes at best and often flat out ignores the very real work very real artists put in to leverage the new tools.
Output of current image generators are trash. It's unsalvageable. That's the problem, not "regressive pattern".
New tools aren't inherently inferior, they open up new opportunities.
creative power without control is like a rocket with no navigation—sure, you'll launch, but who knows where you'll crash!
It is technically interesting, and a lot of what it creates does have its own aesthetic appeal just because of how uncanny it can get, particularly in a photorealistic format. It's like looking at the product of an alien mind, or an alternate reality. But as an expression of actual human creative potential and directed intent I think it will always fall short of the tools we already have. They require skilled human beings who require paychecks and sustenance and sleep and toilets, and sometimes form unions, and unfortunately that's the problem AI is being deployed to solve in the hope that "extruded AI art product" is good enough to make a profit from.
You may feel different if it’s, say, art assets in your new favorite video game, frames of a show, or supplementary art assets in some sort of media.
A lot of people will not notice the missing reflections and because of this our gatekeepers to quality will disappear.
A Bluegrass song about how much fun it is to punch holes in drywall like a karate master.
A post-punk/hardcore song about the taste of the mud and rocks at the bottom of a mountain stream in the newly formed mountains of Oklahoma.
A hair band power ballad about white dad sneakers.
But for "serious" songs, the end result sounds like generic muzak you might hear in the background at Wal-Mart.
I got my shit together meeting Christ and reading Marx
It failed my little fire but it spread a dying spark
https://youtu.be/elr0JmB7Ac8?t=422D art has a lot of strong tooling though. If you’re actually trying to use AI art tooling, you won’t be just dropping a prompt and hoping for the best. You will be using a workflow graph and carefully iterating on the same image with controlled seeds and then specific areas for inpainting.
We are at an awkward inflection point where we have great tooling for the last generation of models like SDXL, but haven’t really made them ready for the current gen of models (Flux) which are substantially better. But it’s basically an inevitability on the order of months.
I've found this to be observable in practice - I follow hundreds of artists who I could reliably name by seeing a new example of their work, even if they're only amateurs, but I find that AI art just blurs together into a samey mush with nothing to distinguish the person at the wheel from anyone else using the same models. The tool speaks much louder than the person supposedly directing it, which isn't the case with say Photoshop, Clip Studio or Blender.
Art made by unskilled randos is always going to blur together. But the question I feel we’re discussing here is whether a dedicated artist can use them for production grade content. And the answer is yes.
It's just a matter of time until some big IP holder makes "productionizable" generative art, no? "Tweaking the output" is just an opinion, and people already ship tons of AAA art with flaws that lacked budget to tweak. How is this going to be any different?
What it really lacks is domain knowledge. Current imagen is done by ML nerds, not artists, and they are simply unaware of what needs to be done to make it useful in the industry, and what to optimize for. I expected big animation studios to pick up the tech like they did with 3D CGI in the 90s, but they seem to be pretty stagnant nowadays, even besides the animosity and the weird culture war surrounding this space.
In other words, it's not productized because nobody productized it, not because it's impossible.
The last 2 can have tremendous talent but the society at large isn’t that sensitive to the higher quality output.
Edit: Wow! they are loaded directly from the server where I assume no cdn is involved. And what's even worse they're not lazy loaded. No wonder why it cannot handle a little bit of traffic.
It's always easy to talk about "actually trying to build quality content" in the abstract. Your thing, blog post or whatever, doesn't pitch us a game. Where is your quality content?
That said, having opinions is a pitch. A16Z will maybe give you like, $10m for your "Human Generated Authentic badge" anti-AI company or whatever. Go for it dude, what are you waiting for? Sure it's a lot less than $220m for "Spatial Intelligence." But it's $10m! Just take it!
You can slap your badge onto Fortnite and try to become a household name by shipping someone else's IP. That makes sense to me. Whether you can get there without considering "engagement," I don't know.
>My name is Daniel Holden. I'm a programmer and occasional writer currently working as a Principal Animation Programmer at Epic Games and doing research mainly on Machine Learning and Character Animation.
Their quality content is assuredly building tools for other professionals. So, B2B. A very different kind of "content creator" than those working B2C.
And their pitch itself is already done, likely being paid 200k+/yr to directly or indirectly help make Fornite look better or iterate faster. So... mission accomplished? How Epic is sourcing their deep learning tech will be interesting to see in the coming years as society figures out boundaries on AI tech.