231 pointsby ibobev2 days ago16 comments
  • meebob2 days ago
    Something I really enjoyed about this article is that really helps explain a counterintuitive result in hand drawn 2D animation. It's a well known phenomenon in hand drawn 2D animation that naively tracing over live action footage usually results in unconvincing and poor quality animation. The article demonstrates how sampling and even small amounts of noise can make a movement seem unconvincing or jittery- and seeing that, it suddenly helps make sense how something like simple tracing at 12 fps would produce bad results, without substantial error correction (which is where traditional wisdom like arcs, simplification etc comes in).
    • kderbe2 days ago
      2D animation traced over live action is called rotoscoping. Many of Disney's animated movies from the Walt Disney era used rotoscoping, so I don't think it's fair to say it results in poor quality.

      https://en.wikipedia.org/wiki/List_of_rotoscoped_works#Anima...

      • Isamua day ago
        The comment was about naive tracing. When Disney used rotoscoping they had animators draw conforming to a character model on top of the live action pose.

        The experienced animator and inbetweeners knew how to produce smooth line motion, and the live action was used for lifelike pose, movement, etc. It wasn’t really tracing.

        There’s examples of this in the Disney animation books, the finished animation looks very different from the live actors, but with the same movement.

        • zktrutha day ago
          On the other side of the same coin, when animating VFX for live action, animation which looks "too clean" is also a failure mode. You want to make your poses a little less good for camera, introduce a little bit of grime and imperfection, etc.

          Animation is a great art and it takes a lot of skill to make things look the way they ought to for whatever it is you are trying to achieve.

          Most animators don't like the "digital makeup" comparison (because it's often used in a way which feels marginalizing to their work on mocap-heavy shows), but if you interpret it in the sense that makeup makes people look the way they are "supposed to" I think it's a good model for understanding why rotoscope and motion capture don't yet succeed without them.

      • autoexec2 days ago
        Rotoscoping has its place. It can save a lot of time/money for scenes with complex motion and can produce good results, but overreliance on it does tend to produce worse animation since it can end up being constrained to just what was captured on film. Without it, animators are more free to exaggerate certain motions, or manipulate the framerate, or animate things that could never be captured on camera in the first place. That kind of freedom is part of what makes animation such a cool medium. Animation would definitely be much worse off if rotoscoping was all we had.
        • tuna74a day ago
          "Animation would definitely be much worse off if rotoscoping was all we had." Yeah, then it wouldn't be animation anymore.
          • autoexeca day ago
            I mean, rotoscoping is still animation, but it's just one technique/tool of the trade. I thought it was used well in Undone, and I enjoyed The Case of Hana & Alice
      • engeljohnba day ago
        Rotoscoping was utilized for some difficult shots. Mostly live action was used for reference, not directly traced, Fleischer style. I've never seen rotoscoping that looked so masterful as Snow White and similar golden age films.

        https://www.youtube.com/watch?v=smqEmTujHP8

      • A Scanner Darkly is rotoscoped

        https://youtu.be/l1-xKcf9Q4s

  • oDot2 days ago
    I spend a lot of my time researching live-action anime[0][1], and there's an important thing to learn from Japanese animators: sometimes an animation style may seem technically lacking, but visually stunning.

    When animator Ken Arto was on the Trash Taste podcast he mentioned how Disney had the resources to perfect the animation, while in Japan they had to achieve more with less.

    This basically shifts the "what is good animation" discussion in ways that are not as clear from looking at the stats.

    [0] https://blog.nestful.app/p/ways-to-use-nestful-outlining-ani...

    [1] https://www.youtube.com/watch?v=WiyqBHNNSlo

    • oreally2 days ago
      These kinds of perspectives are often found and parroted in perceived 'elite' circles. It's no wonder the author works in Epic Games, a place in which one would need high technical chops to work there.

      It's also no wonder why such people get disconnected from some realities on the ground. Sure on paper people do want higher quality things but they don't even know what those are. Most people have low-brow tastes; they'd take a cheaper and well-marketed thing over a 1% improvement.

      Japan didn't need to compete on the same ladder for success, it needed to mix various elements of what they're good at to achieve it's own success.

      • oDot2 days ago
        Exactly right. Sometimes those "higher quality" things may lead to reduced quality, most commonly by reaching the uncanny valley.

        Interestingly that does not happen in the opposite direction. When "reducing" certain stats on real footage (which is what live-action anime should do[0]) the uncanny valley is skipped. Maybe it's harder to fall into when going backwards? More research is needed.

        BTW, I love your books

        [0] https://www.youtube.com/shorts/3ZiBu5Il2eY

      • jncfhnb2 days ago
        Those dumb artists focusing on quality instead of revenue!
  • djmips2 days ago
    'Obviously a huge part of this is the error propagation that we get down the joint chain... but"

    This shouldn't be glossed over and a proper consideration of the error metric here is key to storing quality animation with fewer bits, lower bandwidth and higher performance.

    • rcxdude20 hours ago
      It does feel like joint rotations are probably not a great representation for processing animation data. And it's not generally how the motion is captured, nor does it seem to be how the human eye analyses it. But I don't know what an alternative would look like, and I'm pretty sure smarter people than I have spent a lot of time thinking about it.
    • doctorpangloss2 days ago
      Fitting joints onto a text-prompted Sora-generated video: could "transformers" not make all this stuff obsolete too? You might need the motion capture data for ground truth to fit joints, but maybe not to generate animation itself.
  • pvillano2 days ago
    Image generation has its own problems with non-cancelling noise.

    For example, images are often generated with jpeg artifacts in regions but not globally.

    Watermarks are also reproduced.

    Some generated images have artifacts from CCD cameras

    https://www.eso.org/~ohainaut/ccd/CCD_artifacts.html

    Images generated from Google Street View data would likely contain features specific to the cars/cameras used in each country

    https://www.geometas.com/metas/categories/google_car/

    • doctorpangloss2 days ago
      It seems like such an obvious and surmountable problem though. Indeed since 2020 there are robust approaches to eliminating JPEG artifacts, for example - browse around here - https://openmodeldb.info/.
  • numpad02 days ago
    • Retr0id2 days ago
      Seems like the media files still load from the original domain
  • > one of the highest quality publicly available datasets of motion capture in the graphics community

    > This data is sampled at 120 Hz, with finger and toe motions

    But when I watch the videos they look like the dancer had palsy affecting their hands or were wearing astronaut gloves, because the fingers barely move for the most part.

  • tech_ken2 days ago
    The points about the effects of noise are super interesting. Kind of mind blowing to think about the sensitivity of our perception being so different across visual channels (color, shape, movement, etc).
  • javier_e06a day ago
    If one looks at the YODA puppet in The Empire Strikes back, of course, moves like a puppet, but the motion is real. Jerky, emotional, human-like.

    One move to The Clone Wars and the CGI moves are mechanic. Maybe the way to go about animation is not on the eye of the beholder but on careful comparison of analog vs digital renderings: Film a human running on analog and pair it pixel by pixel with the digital cgi counterpart.

  • roughlya day ago
    The author discusses the perceptual allowances for different kinds of inputs (the noise in images, etc), and it's a really interesting point that helps sketch some boundaries around where the LLM/Diffusion model paradigms are useful.

    Human color perception is almost entirely comparative - we see something as Blue because within the context of the other objects in a scene and the perceived lighting, the color an object would be that looked the way the object in the scene does is Blue (this is the blue dress phenomenon) - and so noise in images is easy for us to ignore. Similarly, audio and especially speech perception is also very strongly contextually dependent (as attested by the McGurk effect), so we can also deal with a lot of noise or imprecision - in other words, generative guesswork.

    Motion, on the other hand, and especially human motion, is something we're exquisitely attentive to - think of how many horror movies convey a character's 'off-ness' by subtle variations in how they move. In this case, the diffusion model's tendency towards guesswork is much, much less easily ignored - our brains are paying tight attention to subtle variations, and anything weird alarms us.

    A constant part of the conversation around LLMs, etc. is exactly this level of detail-mindedness (or, the "hallucinations" conversation), and I think that's basically where you're going to land with things like this - where you need actual genuine precision, where there's some proof point on whether or not something is accurate, the generative models are going to be a harder fit, whereas areas where you can get by with "pretty good", they'll be transformative.

    (I've said it elsewhere here, but my rule of thumb for the LLMs and generative models is that if a mediocre answer fast moves the needle - basically, if there's more value in speed than precision - the LLMs are a good fit. If not, they're not.)

  • nmaciasa day ago
    the shoulder rotation plotted at various frequencies sparked for me: is there an "MP3" of character animation data? The way that we have compression optimized for auditory perception… it feels like we might be missing an open standard for compressing this kind of animation data?

    edit: Claude is thinking MP3 could work directly: pack 180Hz animation channels into a higher frequency audio signal with some scheme like Frequency Division / Time Division Multiplexing, or Amplitude Modulation. Boom, high compression with commonplace hardware support.

    • cfstrasa day ago
      That same graph had me jump towards the sampling theorem - playing back an animation with linear interpolation creates hard edges, e.g. frequency spikes. I‘m not sure if the movement space is comparable to audio here, but I can‘t see why not.

      so; if the sampling theorem applies; having 2x the maximum movement „frequency“ should be enough to perfectly recreate them, as long as you „filter out“ any higher frequencies when playing back the animation by using something like fft upscaling (re-sampling) instead of linear or bezier interpolation.

      (having written this, I realize that‘s probably what everyone is doing.)

    • xMissingnoa day ago
      I would love to be corrected on this - but my understanding of frequency compression is that you have to decode the entire file before being able to play back the audio. Therefore, in real time applications with limited RAM (video games) you don't want to wait for the entire animation to be decoded before streaming the first frames.

      Can anyone think of a system with better time-to-first-frame that achieves good compression?

      • nmaciasa day ago
        most audio and video schemes support streaming, in the case of MP3 we are talking about frame-based compression

        I guess to restate my curiosity: are things like Animation Pose Compression in Unity or equivalents in other engines remotely as good as audio techniques with hardware support? The main work on this seems to be here and I didn't see any references to audio codecs in the issue history fwiw. https://github.com/nfrechette/acl

  • baruchthescribe2 days ago
    The author did some very cool work with Raylib interpolating between animations to make transitions more natural. I remember being blown away at how realistic it looked from the videos he posted in the Discord. Glad to see he's still pushing the boundaries on what's possible with quality animation. And of course Cello rocks!
  • erichoceana day ago
    I own an animation studio.

    Animation and motion are two different things—related, but definitely not the same. They don't rely on the same principles and they don't capture the same data.

    Most people use the terms interchangeably, probably because the tools to process key frames are USUALLY the same.

    Animation frames aren't regular the way mo-cap is. Instead, they are designed to furnish the eye (persistence of vision) with images that, in sequence, produce a sense of crisp motion to the viewer.

    It's a subtle distinction, but the result is wildly different. In animation, the ACTUAL POSES matter a great deal. In mo-cap, they don't matter at all, it's all about regular sampling and then you just render (in 3D) what you want.

    Video game cut scenes are what more-or-less raw "mo-cap" looks like if you're curious.

  • cameron_b2 days ago
    I love the statement in the conclusion.

    Curation is something we intrinsically favor over engagement algorithms. Noisy is easy to quantify, but greatness is not. Greatness might have a lag in engagement metrics while folks read or watch the material. It might provoke consideration, instead of reaction.

    Often we need seasons of production in order to calibrate our selection criteria, and hopefully this season of booming generation leads to a very rich new opportunity to curate great things to elevate from the noise.

    • 21 hours ago
      undefined
    • MichaelZuoa day ago
      Why is curation relevant to ‘greatness’?

      By definition 99% of the content produced has to be in the bottom 99 percentiles, in any given year.

      Even if the entire world decided everything must be curated, that would just mean the vast vast majority of curators have not-great taste.

      Whereas in a future world where 99% of it is driven by algorithms, that would mean the vast majority of curators have ‘great’ taste.

      But this seems entirely orthogonal.

  • Scene_Cast22 days ago
    Something I keep seeing is that modern ML makes for some really cool and impressive tech demos in the creative field, but is not productionizable due to a lack of creative control.

    Namely, anything generating music / video / images - tweaking the output is not workable.

    Some notable exceptions are when you need stock art for a blog post (no need for creative control), Adobe's recolorization tool (lots of control built in), and a couple more things here and there.

    I don't know how it is for 3D assets or rigged model animation (as per the article), never worked with them. I'd be curious to hear about successful applications, maybe there's a pattern.

    • jncfhnb2 days ago
      Probably accurate for videos and music. Videos because there’s going to be just too many things to correct to make it time efficient. Music because music just needs to be excellent or it’s trash. That is for high quality art of course. You can ship filler garbage for lots of things.

      2D art has a lot of strong tooling though. If you’re actually trying to use AI art tooling, you won’t be just dropping a prompt and hoping for the best. You will be using a workflow graph and carefully iterating on the same image with controlled seeds and then specific areas for inpainting.

      We are at an awkward inflection point where we have great tooling for the last generation of models like SDXL, but haven’t really made them ready for the current gen of models (Flux) which are substantially better. But it’s basically an inevitability on the order of months.

      • jsheard2 days ago
        Even with the relatively strong tooling for 2D art it's still very difficult to push the generated image in novel directions though, hence the heavy reliance on LoRAs trained on prior examples. There doesn't seem to be an answer to "how would you create [artists] style with AI" that doesn't require [artist] to already exist so you can throw their life's work into a blender and make a model that copies it.

        I've found this to be observable in practice - I follow hundreds of artists who I could reliably name by seeing a new example of their work, even if they're only amateurs, but I find that AI art just blurs together into a samey mush with nothing to distinguish the person at the wheel from anyone else using the same models. The tool speaks much louder than the person supposedly directing it, which isn't the case with say Photoshop, Clip Studio or Blender.

        • jncfhnb2 days ago
          Shrug. That’s a very different goal. Yes, if you want to leverage a different style your best bet is to train a Lora off a dozen images in that style.

          Art made by unskilled randos is always going to blur together. But the question I feel we’re discussing here is whether a dedicated artist can use them for production grade content. And the answer is yes.

    • doctorpangloss2 days ago
      > but is not productionizable due to a lack of creative control.

      It's just a matter of time until some big IP holder makes "productionizable" generative art, no? "Tweaking the output" is just an opinion, and people already ship tons of AAA art with flaws that lacked budget to tweak. How is this going to be any different?

      • fwipa day ago
        No, it's not "just a matter of time." It's an open question whether it's even possible with anything resembling current techniques.
        • orbital-decaya day ago
          I don't think it is a question at all. It is not just possible, it's implemented in reality. Compositing is a thing in imagen space, and source adjustments in this scheme are trivial. I'm talking about controlnets, style transfer adapters, straight up neural rendering of simplified 3D scenes, training on custom references, and a ton of other methods to establish control. Temporal stability is also a solved issue.

          What it really lacks is domain knowledge. Current imagen is done by ML nerds, not artists, and they are simply unaware of what needs to be done to make it useful in the industry, and what to optimize for. I expected big animation studios to pick up the tech like they did with 3D CGI in the 90s, but they seem to be pretty stagnant nowadays, even besides the animosity and the weird culture war surrounding this space.

          In other words, it's not productized because nobody productized it, not because it's impossible.

          • 12 hours ago
            undefined
    • AlienRobot2 days ago
      Something I realized about AI is that an AI that generates "art" be it text, image, animation, video, photography, etc., is cool. The product it generates, however, is not.

      It's very cool that we have a technology that can generate video, but what's cool is the tech, not the video. It doesn't matter if it's a man eating spaghetti or a woman walking in front of dozens of reflections. The tech is cool, the video is not. It could be ANY video and just the fact AI can generate is cool. But nobody likes a video that is generated by AI.

      A very cool technology to produce products that nobody wants.

      • w0m2 days ago
        That's an over simplification I think. If you're only generating a video because 'I can oooh AI' - then of course no one wants it. If you treat the tools as what they are, Tools - then people may want it.

        No one really cares about a tech demo, but if generative tools help you make a cool music video to an awesome song? People will want it.

        Well, as long as they aren't put off by a regressive stigma against new tool at least.

        • giraffe_lady2 days ago
          Are there any valid reasons people might not like this or is it only "regressive stigma?"
          • bobthepanda2 days ago
            Humans find lots of value in human effort towards culturally important things.

            See: a grandmother’s food vs. the industrial equivalent

        • AlienRobot2 days ago
          If you used AI to make something awesome, even if I liked it, I'd feel scammed if it wasn't clearly labelled as AI, and if it was clearly labelled as AI I wouldn't even look at it.
          • mindcandya day ago
            > if it was clearly labelled as AI I wouldn't even look at it.

            If you dislike it without even seeing it, that would indicate the problem isn't with the video...

            • AlienRobota day ago
              Yes, the problem is with AI. I'm tired of trying to find X and finding "AI X" instead. I google "pixel art" I get "AI pixel art." I google clipart I get "AI clipart." I go to /r/logodesign to see some cool logo designs, it's 50% people who used ChatGPT asking if it looks good enough.

              The only good AI is AI out of my sight.

          • w0ma day ago
            > I'd feel scammed if it wasn't clearly labelled as AI

            TBF - have you looked at a digital photo made in the last decade? Likely had significant 'AI' processing applied to it. That's why I call it a regressive pattern to dislike anything with a new label attached - it minimizes at best and often flat out ignores the very real work very real artists put in to leverage the new tools.

            • numpad0a day ago
              Face it. People are okay with super resolution efforts, including most deep learning-based methods. But not "AI". You can run video through i2i as a cleanup tool and upload it on the Internet, some tried and quit. YouTubers and TikTokers aren't doing it and they're all for attention.

              Output of current image generators are trash. It's unsalvageable. That's the problem, not "regressive pattern".

            • AlienRobota day ago
              You still have to take the photo. That's a billion times more effort than typing a prompt in ChatGPT.
              • w0m16 hours ago
                honestly, that's the same argument people made against photographs when the technology became available. Same argument made against the printing press.

                New tools aren't inherently inferior, they open up new opportunities.

                • AlienRobot10 hours ago
                  I've never seen a photograph pretending to be an illustration, or vice-versa. It's only AI that pretends to be a genre it isn't.
      • namtab002 days ago
        > A very cool technology to produce products that nobody wants.

        creative power without control is like a rocket with no navigation—sure, you'll launch, but who knows where you'll crash!

      • jncfhnb2 days ago
        The problem in your example is that you wouldn’t think a picture of a man eating spaghetti taken by a real person would be cool.

        You may feel different if it’s, say, art assets in your new favorite video game, frames of a show, or supplementary art assets in some sort of media.

      • noja2 days ago
        > or a woman walking in front of dozens of reflections

        A lot of people will not notice the missing reflections and because of this our gatekeepers to quality will disappear.

      • postexitus2 days ago
        While I am in the same camp as you, there is one exception: Music. Especially music with lyrics (like suno.com) - Although I know that it's not created by humans, the music created by Suno is still very listenable and it evokes feelings just like any other piece of music does. Especially if I am on a playlist and doing something else and the songs just progress into the unknown. Even when I am in a more conscious state - i.e. creating my own songs in Suno, the end result is so good that I can listen to it over and over again. Especially those ones that I create for special events (like mocking a friend's passing phase of communism and reverting back to capitalism).
        • Loughla2 days ago
          In my opinion, Suno is good for making really funny songs, but not for making really moving songs. Examples of songs that make me chuckle that I've had it do:

          A Bluegrass song about how much fun it is to punch holes in drywall like a karate master.

          A post-punk/hardcore song about the taste of the mud and rocks at the bottom of a mountain stream in the newly formed mountains of Oklahoma.

          A hair band power ballad about white dad sneakers.

          But for "serious" songs, the end result sounds like generic muzak you might hear in the background at Wal-Mart.

        • calflegal2 days ago
          appreciate your position but mine is that everything out of suno sounds like copycat dog water.
          • xerox13ster2 days ago
            Makes sense that GP appreciates the taste of dog water when they’re mocking their friends for having had values (friends whom likely gave up their values to stop being mocked)
      • krapp2 days ago
        Yes, it turns out there's more to creating good art than simulating the mechanics and technique of good artists. The human factor actually matters, and that factor can't be extrapolated from the data in the model itself. In essence it's a lossy compression problem.

        It is technically interesting, and a lot of what it creates does have its own aesthetic appeal just because of how uncanny it can get, particularly in a photorealistic format. It's like looking at the product of an alien mind, or an alternate reality. But as an expression of actual human creative potential and directed intent I think it will always fall short of the tools we already have. They require skilled human beings who require paychecks and sustenance and sleep and toilets, and sometimes form unions, and unfortunately that's the problem AI is being deployed to solve in the hope that "extruded AI art product" is good enough to make a profit from.

    • 2 days ago
      undefined
    • detourdog2 days ago
      The generated artwork will initially displace clipart/stock footage and then illustrators and graphic designers.

      The last 2 can have tremendous talent but the society at large isn’t that sensitive to the higher quality output.

  • LoganDark2 days ago
    Seems like this site is getting hugged to death right now
    • Bilal_io2 days ago
      I haven't checked, but I think some of the videos on the page might be served directly from the server.

      Edit: Wow! they are loaded directly from the server where I assume no cdn is involved. And what's even worse they're not lazy loaded. No wonder why it cannot handle a little bit of traffic.

    • 2 days ago
      undefined
    • 2 days ago
      undefined
    • consumer4512 days ago
      [flagged]
  • doctorpangloss2 days ago
    > The people who are actually trying to build quality content are being forced to sink or swim - optimize for engagement or else be forgotten... There are many people involved in deep learning who are trying very hard to sell you the idea that in this new world of big-data...

    It's always easy to talk about "actually trying to build quality content" in the abstract. Your thing, blog post or whatever, doesn't pitch us a game. Where is your quality content?

    That said, having opinions is a pitch. A16Z will maybe give you like, $10m for your "Human Generated Authentic badge" anti-AI company or whatever. Go for it dude, what are you waiting for? Sure it's a lot less than $220m for "Spatial Intelligence." But it's $10m! Just take it!

    You can slap your badge onto Fortnite and try to become a household name by shipping someone else's IP. That makes sense to me. Whether you can get there without considering "engagement," I don't know.