Gaussian Splatting – A$AP Rocky "Helicopter" music video(radiancefields.com)

772 pointsby ChrisArchitect21 days ago49 comments

darhodester21 days ago
Hi,
I'm David Rhodes, Co-founder of CG Nomads, developer of GSOPs (Gaussian Splatting Operators) for SideFX Houdini. GSOPs was used in combination with OTOY OctaneRender to produce this music video.
If you're interested in the technology and its capabilities, learn more at https://www.cgnomads.com/ or AMA.
Try GSOPs yourself: https://github.com/cgnomads/GSOPs (example content included).
- henjodottech20 days ago
  I’m fascinated by the aesthetic of this technique. I remember early versions that were completely glitched out and presented 3d clouds of noise and fragments to traverse through. I’m curious if you have any thoughts about creatively ‘abusing’ this tech? Perhaps misaligning things somehow or using some wrong inputs.
  - darhodester20 days ago
    There's a ton of fun tricks you can perform with Gaussian splatting!
    You're right that you can intentionally under-construct your scenes. These can create a dream-like effect.
    It's also possible to stylize your Gaussian splats to produce NPR effects. Check out David Lisser's amazing work: https://davidlisser.co.uk/Surface-Tension.
    Additionally, you can intentionally introduce view-dependent ghosting artifacts. In other words, if you take images from a certain angle that contain an object, and remove that object for other views, it can produce a lenticular/holographic effect.
    echelon20 days ago
    Y'all did such a good job with this. It captivated HN and was the top post for the entire day, and will probably last for much of tomorrow.
    If you don't know already, you need to leverage this. HN is one of the biggest channels of engineers and venture capitalists on the internet. It's almost pure signal (minus some grumpy engineer grumblings - we're a grouchy lot sometimes).
    Post your contract info here. You might get business inquiries. If you've got any special software or process in what you do, there might be "venture scale" business opportunities that come your way. Certainly clients, but potentially much more.
    (I'd certainly like to get in touch!)
    --
    edit: Since I'm commenting here, I'll expand on my thoughts. I've been rate limited all day long, and I don't know if I can post another response.
    I believe volumetric is going to be huge for creative work in the coming years.
    Gaussian splats are a huge improvement over point clouds and NeRFs in terms of accessibility and rendering, but the field has so many potential ways to evolve.
    I was always in love with Intel's "volume", but it was impractical [1, 2] and got shut down. Their demos are still impressive, especially from an equipment POV, but A$AP Rocky's music video is technically superior.
    During the pandemic, to get over my lack of in-person filmmaking, I wrote Unreal Engine shaders to combine the output of several Kinect point clouds [3] to build my own lightweight version inspired by what Intel was doing. The VGA resolution of consumer volumetric hardware was a pain and I was faced with fpga solutions for higher real time resolution, or going 100% offline.
    World Labs and Apple are doing exciting work with image-to-Gaussian models [4, 5], and World Labs created the fantastic Spark library [6] for viewing them.
    I've been leveraging splats to do controllable image gen and video generation [7], where they're extremely useful for consistent sets and props between shots.
    I think the next steps for Gaussian splats are good editing tools, segmenting, physics, etc. The generative models are showing a lot of promise too. The Hunyuan team is supposedly working on a generative Gaussian model.
    [1] https://www.youtube.com/watch?v=24Y4zby6tmo (film)
    [2] https://www.youtube.com/watch?v=4NJUiBZVx5c (hardware)
    [3] https://www.twitch.tv/videos/969978954?collection=02RSMb5adR...
    [4] https://www.worldlabs.ai/blog/marble-world-model
    [5] https://machinelearning.apple.com/research/sharp-monocular-v...
    [6] https://sparkjs.dev/
    [7] https://github.com/storytold/artcraft (in action: https://www.youtube.com/watch?v=iD999naQq9A or https://www.youtube.com/watch?v=f8L4_ot1bQA )
    darhodester20 days ago
    First, all credit for execution and vision of Helicopter go to A$AP, Dan Streit, and Grin Machine (https://www.linkedin.com/company/grin-machine/about/). Evercoast and Wild Capture were also involved.
    Second, it's very motivating to read this! My background is in video game development (only recently transitioning to VFX). My dream is to make a Gaussian splatting content creation and game development platform with social elements. One of the most exciting aspects of Gaussian splatting is that it democratizes high quality content acquisition. Let's make casual and micro games based on the world around us and share those with our friends and communities.
    bininunez20 days ago
    Thanks darhodester! It was definitely a broad team effort that started with Rocky and Streit's creative genius which was then made possible by Evercoast's software to capture and generate all the 4D splat data (www.evercoast.com), which then flowed to the incredible people at Grin Machine and Wild capture who used GSOPs and OctaneRender.
    eMPee58420 days ago
    What do you think about the sparse voxel approach, shouldn't it be more compute efficient than computing zillions of ellipsoids? My understanding of CGI prolly is t0o shallow but I wonder why it hasn't caught on much..
    darhodester20 days ago
    I believe most of the "voxel" approaches also require some type of inference (MLP). This limits the use case and ability to finely control edits. Gaussian splatting is amazing because each Gaussian is just a point in space with a rotation and non-uniform scale.
    The most expensive part of Gaussian splatting is depth sorting.
  - darhodester20 days ago
    The ghost effect is pretty cool, too! https://www.youtube.com/watch?v=DQGtimwfpIo
  - jofzar20 days ago
    https://youtu.be/eyAVWH61R8E?t=3m53s
    Superman is what comes to mind for this
- kqr20 days ago
  I remember splatting being introduced as a way to capture real life scenes, but one of the links you have provided in this discusson seems to have used a traditional polygon mesh scene as training input for the splat model. How common is this and why would one do it that way over e.g. vertex shader effects that give the mesh a splatty aesthetic?
  - darhodester20 days ago
    Yes, it's quite trivial to convert traditional CG to Gaussian splats. We can render our scenes/objects just as we would capture physical spaces. The additional benefits of using synthetic data is 100% accurate camera poses (alignment) which means the structure from motion (SfM) step can be bypassed.
    It's also possible to splat from textured meshes directly, see: https://github.com/electronicarts/mesh2splat. This approach yields high quality, PBR compatible splats, but is not quite as efficient as a traditional training workflow. This approach will likely become mainstream in third party render engines, moving forward.
    Why do this? 1. Consistent, streamlined visuals across a massive ecosystem, including content creation tools, the web, and XR headsets. 2. High fidelity, compressed visuals. With SOGs compression, splats are going to become the dominant 3D representation on the web (see https://superspl.at). 3. E-commerce (product visualizations, tours, real-estate, etc.) 4. Virtual production (replace green screens with giant LED walls). 5. View-dependent effects without (traditional) shaders or lighting
    It's not just about the aesthetic, it's also about interoperability, ease of use, and the entire ecosystem.
- sbierwagen21 days ago
  From the article:
  >Evercoast deployed a 56 camera RGB-D array
  Do you know which depth cameras they used?
  - bininunez20 days ago
    We (Evercoast) used 56 RealSense D455s. Our software can run with any camera input, from depth cameras to machine vision to cinema REDs. But for this, RealSense did the job. The higher end the camera, the more expensive and time consuming everything is. We have a cloud platform to scale rendering, but it’s still overall more costly (time and money) to use high res. We’ve worked hard to make even low res data look awesome. And if you look at the aesthetic of the video (90s MTV), we didn’t need 4K/6K/8K renders.
    bredren20 days ago
    You may have explained this elsewhere, but if not—-what kind of post processing did you do to upscale or refine the realsense video?
    Can you add any interesting details on the benchmarking done against the RED camera rig?
    spookie20 days ago
    This is a great question, would love some some feedback on this.
    I assume they stuck with realsense for proper depth maps. However, those are both limited to a 6 meters range, and their depth imaging isn't able to resolve features smaller than their native resolution allows (gets worse after 3m too, as there is less and less parallax among other issues). I wonder how they approached that as well.
  - darhodester21 days ago
    Aha: https://www.red.com/stories/evercoast-komodo-rig
    So likely RealSense D455.
  - darhodester21 days ago
    I was not involved in the capture process with Evercoast, but I may have heard somewhere they used RealSense cameras.
    I recommend asking https://www.linkedin.com/in/benschwartzxr/ for accuracy.
  - secretsatan21 days ago
    Couldn’t you just use iphone pros for this? I developed an app specifically for photogrammetry capture using AR and the depth sensor as it seemed like a cheap alternative.
    EDIT: I realize a phone is not on the same level as a red camera, but i just saw iphones as a massively cheaper option to alternatives in the field i worked in.
    F7F7F721 days ago
    ASAP Rocky has a fervent fanbase who's been anticipating this album. So I'm assuming that whatever record label he's signed to gave him the budget.
    And when I think back to another iconic hip hop (iconic that genre) video where they used practical effects and military helicopters chasing speedboats in the waters off of Santa Monica...I bet they had change to spear.
    cwillu20 days ago
    Is there any reason to think https://thebaffler.com/salvos/the-problem-with-music doesn't apply here?
    numpad020 days ago
    A single camera only captures the side of the object facing the camera. Knowing how far away that camera facing side of a Rubik's Cube help if you were making educated guesses(novel view synthesis), but it won't solve the problem of actually photographing the backside.
    There are usually six sides on a cube, which means you need minimum six iPhone around an object to capture all sides of it to be able to then freely move around it. You might as well seek open-source alternatives than relying on Apple surprise boxes for that.
    In cases where your subject would be static, such as it being a building, then you can wave around a single iPhone for the same effect for a result comparable to more expensive rigs, of course.
    antidamage19 days ago
    The minimum is four RGB-only cameras (if you want RGB data) but adding lidar really helps.
    The standard pipeline can infer a huge amount of data, and there are a few AI tools now for hallucinating missing geometry and backfaces based on context recognition, which can then be converted back into a splat for fast, smooth rendering.
    20 days ago
    undefined
    darhodester21 days ago
    I think it's because they already had proven capture hardware, harvest, and processing workflows.
    But yes, you can easily use iPhones for this now.
    secretsatan20 days ago
    Looks great by the way, i was wondering if there’s a file format for volumetric video captures
    darhodester20 days ago
    Some companies have a proprietary file format for compressed 4D Gaussian splatting. For example: https://www.gracia.ai and https://www.4dv.ai.
    Check this project, for example: https://zju3dv.github.io/freetimegs/
    Unfortunately, these formats are currently closed behind cloud processing so adoption is a rather low.
    Before Gaussian splatting, textured mesh caches would be used for volumetric video (e.g. Alembic geometry).
    itishappy20 days ago
    https://developer.apple.com/av-foundation/
    https://developer.apple.com/documentation/spatial/
    Edit: As I'm digging, this seems to be focused on stereoscopic video as opposed to actual point clouds. It appears applications like cinematic mode use a monocular depth map, and their lidar outputs raw point cloud data.
    numpad020 days ago
    A LIDAR point cloud from a single point of view is a mono-ocular depth map. Unless the LIDAR in question is like, using supernova level gamma rays or neutrino generators for the laser part to get density and albedo volumetric data for its whole distance range.
    You just can't see the back of a thing by knowing the shape of the front side with current technologies.
    itishappy20 days ago
    Right! My terminology may be imprecise here, but I believe there is still an important distinction:
    The depth map stored for image processing is image metadata, meaning it calculates one depth per pixel from a single position in space. Note that it doesn't have the ability to measure that many depth values, so it measures what it can using LIDAR and focus information and estimates the rest.
    On the other hand, a point cloud is not image data. It isn't necessarily taken from a single position, in theory the device could be moved around to capture addition angles, and the result is a sparse point cloud of depth measurements. Also, raw point cloud data doesn't necessarily come tagged with point metadata such as color.
    I also note that these distinctions start to vanish when dealing with video or using more than one capture device.
    numpad019 days ago
    No, LIDAR data are necessarily taken from a single position. They are 3D, but literally single eyed. You can't tell from LIDAR data if you're looking at a half-cut apple or an intact one. This becomes obvious the moment you tried to rotate a LIDAR capture - it's just the skin. You need depth maps from all angles to reconstruct the complete skin.
    So you have to have minimum two for front and back of a dancer. Actually, the seams are kind of dubious so let's say three 120 degrees apart. Well we need ones looking down as well as up for baggy clothing, so more like nine, 30 degrees apart vertically and 120 degrees horizontally, ...
    and ^ this will go far down enough that installing few dozens of identical non-Apple cameras in a monstrous sci-fi cage starts making a lot more sense than an iPhone, for a video.
    secretsatan20 days ago
    Recording pointclouds over time i guess i mean. I’m not going to pretend to understand video compression, but could it be possible to do the following movement aspect in 3d the same as 2d?
    fastasucan21 days ago
    Why would they go for the cheapest option?
    secretsatan20 days ago
    It was more the point that technology is much cheaper. The company i worked for had completely missed it while trying to develop in house solutions.
  - brcmthrowaway21 days ago
    Kinect Azure
- 21 days ago
  undefined
- dostick20 days ago
  Can such plugin be possible for Davinci Resolve, to have merge of scene captured from two iPhones with spatial data, into 3D scene? With M4 that shouldn’t be problem?
  - darhodester20 days ago
    Yes: https://irrealix.com/plugin/gaussian-splatting-davinci-resol...
    (I'm not the author.)
    You can train your own splats using Brush or OpenSplat
- jeffgreco20 days ago
  Great work! I’d love to see a proper BTS or case study.
  - darhodester20 days ago
    I do believe a BTS is being developed.
  - tokymegz20 days ago
    Stay tuned
- c-fe20 days ago
  Hi David, have you looked into alternatives to 3DGS like https://meshsplatting.github.io/ that promise better results and faster training?
  - darhodester20 days ago
    I have. Personally, I'm a big fan of hybrid representations like this. An underlying mesh helps with relighting, deformation, and effective editing operations (a mesh is a sparse node graph for an otherwise unstructured set of data).
    However, surface-based constraints can prevent thin surfaces (hair/fur) from reconstructing as well as vanilla 3DGS. It might also inhibit certain reflections and transparency from being reconstructed as accurately.
- moralestapia20 days ago
  Random question, since I see your username is green.
  How did you find out this was posted here?
  Also, great work!
  - darhodester20 days ago
    My friend and colleague shared a link with me. Pretty cool to see this trending here. I'm very passionate about Gaussian splatting and developing tools for creatives.
    And thank you!
- npkk220 days ago
  I've been mesmerized by the visusals of Gaussian splatting for a while now, congratulations for your great work!
  Do you have some benchmarks about what is the geometric precision of these reproductions?
  - darhodester20 days ago
    Thank you!
    Geometric analysis for Gaussian splatting is a bit like comparing apples and oranges. Gaussian splats are not really discrete geometry, and their power lies in overlapping semi-transparent blobs. In other words, their benefit is as a radiance field and not as a surface representation.
    However, assuming good camera alignment and real world scale enforced at the capture and alignment steps, the splats should match real world units quite closely (mm to cm accuracy). See: https://www.xgrids.com/intl?page=geomatics.
- tamat20 days ago
  nice work.
  I can see that relighting is still a work in progress, as the virtual spot lights tends to look flat and fake. I understand that you are just making brighter splats that fall inside the spotlight cone and darker the ones behind lots of splats.
  Do you know if there are plans for gaussian splats to capture unlit albedo, roughness and metalness? So we can relight in a more realistic manner?
  Also, environment radiosity doesnt seem to translate to the splats, am I right?
  Thanks
  - darhodester20 days ago
    Thank you!
    There are many ways to relight Gaussian splats. However, the highest quality results are currently coming from raytracing/path tracing render engines (such as Octane and VRay), with 2D diffusion models in second place. Relighting with GSOPs nodes does not yield as high quality, but can be baked into the model and exported elsewhere. This is the only approach that stores the relit information in the original splat scene.
    That said, you are correct that in order to relight more accurately, we need material properties encoded in the splats as well. I believe this will come sooner than later with inverse rendering and material decomposition, or technology like Beeble Switchlight (https://beeble.ai). This data can ultimately be predicted from multiple views and trained into the splats.
    "Also, environment radiosity doesnt seem to translate to the splats, am I right?"
    Splats do not have their own radiosity in that sense, but if you have a virtual environment, its radiosity can be translated to the splats.
  - darhodester20 days ago
    This may interest you: https://www.linkedin.com/posts/radiancefields_in-case-you-we...
  - Syzygies20 days ago
    Back in 2001 I was the math consultant for "A Beautiful Mind". One spends a lot of time waiting on a film set. Eventually one wonders why.
    The majority of wait time was the cinematographer lighting each scene. I imagined a workflow where secondary digital cameras captured 3D information, and all lighting took place in post production. Film productions hemorrhage money by the second; this would be a massive cost saving.
    I described this idea to a venture capitalist friend, who concluded one already needed to be a player to pull this off. I mentioned this to an acquaintance at Pixar (a logical player) and they went silent.
    Still, we don't shoot movies this way. Not there yet...
- mmaaz20 days ago
  Really cool work!
- huflungdung20 days ago
  [dead]
- delaminator20 days ago
  [flagged]
- chrisjj20 days ago
  [flagged]
  - dagmx20 days ago
    Is it possible you didn’t comprehend which parts were 3D?
    Or if you did, perhaps a critique is better rather than just a low effort diss.
    chrisjj20 days ago
    I viewed on a flat monitor, so perhaps I missed some 4D and 5D too.
    /i
  - darhodester20 days ago
    That's hurtful.
  - 20 days ago
    undefined
- GrowingSideways20 days ago
  Take the money and never admit to selling this shit. Why would you ever willingly associate your name with this?
  - darhodester20 days ago
    Read the room. Plenty of people are interested in the aesthetics and the technology.
    GrowingSideways20 days ago
    Just because people want to give you money doesn't mean you toss your dignity out the window.
Foreignborn20 days ago
I want to shoutout Nial Ashley (aka Llainwire) for doing this in 2023 as a solo act and doing the visuals himself as well - https://www.youtube.com/watch?v=M1ZXg5wVoUU
A shame that kid was slept on. Allegedly (according to discord) he abandoned this because so many artists reached out to have him do this style of mv, instead of wanting to collaborate on music.
- Hendrikto20 days ago
  > so many artists reached out to have him do this style of mv, instead of wanting to collaborate on music
  Well yes, the visuals are awesome, while the music… isn’t.
  - Foreignborn20 days ago
    I love HN because everyone is so different outside of the core purpose of the site. Sometimes people reference art, or a book or something, that I'd never would think to exist.
    Llainwire was my top artist listens throughout 2023, so it’s always funny to bump into reactions that feel totally different from my world/my peers.
- prodigycorp19 days ago
  This is sooo sick. I’m a total hip hop unc and haven’t caught up for a decade and a half now and I think the music is great as well. Pairs perfectly with the visuals. Hope this guy makes it, a creative one of one.
  His stuff is already on repeat. Thanks for the recc, love this site.
- killjoywashere20 days ago
  You're saying Nial used guassian splatting for his video? Or the style of camerawork, staging, and costuming is similar?
  Put another way, is this a scientific comparison or an artistic comparison?
  - mlrtime20 days ago
    It sounds like to me he [artist] was disappointed that more people were interested in his video editing than his musical efforts.
nodra21 days ago
Never did I think I would ever see anything close to related to A$AP on HN. I love this place.
- keiferski21 days ago
  Hah, for the past day, I've been trying to somehow submit the Helicopter music video / album as a whole to HN. Glad someone figured out the angle was Gaussian.
- PlatoIsADisease20 days ago
  I run a programming company and one of my sales people was surprised to see I liked soundcloud rap. I was like:
  What did you expect?
  >Classical music?
  Nah I like hype, helps when things are slow.
  - m4ck_20 days ago
    Prokofiev's Alexander Nevsky goes hard if you do want something in the classical world though.
    gopher_space20 days ago
    Doctor Octagon’s “moose bumps” iirc.
- wahnfrieden21 days ago
  And nearly a Carti post at the top of HN
  - rw_panic0_021 days ago
    I'm taking the opportunity to FWAEH in here
  - HaZeust20 days ago
    r/playboicarti is one of my favorite places to go to just turn my brain off and see shitposts that have a certain reminiscence to me, almost a "high school class when the teacher didn't show up" vibe.
    wahnfrieden20 days ago
    95% of its posters are in high school or lower, and are in class during daytime hours, so that's a part of why it makes you feel like that
    HaZeust20 days ago
    Indeed. It's a good vibe when you want to turn your brain off sometimes though, same reason why Beavis and Butthead succeeds and is re-aired.
  - erratic_chargi20 days ago
    Bro the day I see Carti on HN is the day I'm leaving this site, some things shouldn't mix
  - 47thpresident20 days ago
    One day we’ll see a an Osamason or Xaviersobased post on HN
    blonddd20 days ago
    No fucking way have I just seen Osamason and Xav mentioned on HN
  - portly20 days ago
    What do you mean?
    wahnfrieden20 days ago
    Helicopter had a Carti feature that was pulled but leaked, and a promo photoshoot with the two of them for it.
- stickfigure21 days ago
  Is he wearing... hair curlers?
  - nodra21 days ago
    That's what one does when they want some fiyah curls.
- elestor20 days ago
  I know right? I'm into both things tech and hip hop, didn't expect them to collide
- callbacked20 days ago
  yeah that had me do a double take lol
- joshcsimmons20 days ago
  Why is that “cool” or desirable?
  - amazingman20 days ago
    Because expertise, love, and care cut across all human endeavor, and noticing those things across domains can be a life affirming kind of shared experience.
    shikshake20 days ago
    Perfect comment, but it’s very funny to me that you even needed to say it. Some folks on here talk like moon people who have never met humans before.
    joshcsimmons17 days ago
    Thanks I’m neurodivergent how could you tell?
    shikshake17 days ago
    So am I, but I don’t use it as an excuse to put down other’s interests.
    HaZeust20 days ago
    Favorited. This will be a timeless comment for me, and will remind some perspective to appreciate things I might not be otherwise familiar with, and thereby care about.
  - nodra20 days ago
    Desirable because it’s a rare culture + tooling combo. I’m into both and HN is one of the few places I would see them come together. So yeah, “cool”
    throwadobe20 days ago
    [flagged]
    dang20 days ago
    "Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
    https://news.ycombinator.com/newsguidelines.html
    itishappy20 days ago
    Right, that's how culture works. There's no universal definition.
    yeeetz20 days ago
    don't be a square
    throwadobe20 days ago
    I'm not, I promise
    nodra20 days ago
    What do you mean by that?
    throwadobe20 days ago
    [flagged]
    zapzupnz20 days ago
    Not everything has to be. Sometimes, an artist's style or a particular track just hits a particular vibe one may be after or need in a particular moment.
    I'm not a fan of this music either but I could imagine hearing it while I'm studying or coding.
    Don't trash something just because it's not your vibe. Not everything has to be Mozart.
    throwadobe20 days ago
    I mean, it's not like I trashed it or compared it to Mozart—I even made sure to include "interesting, stimulating, or tonally remarkable" in an attempt to preempt that latter pushback.
    But even if I did, why can't I? It's fine to call some music shit. Just like you can call my opinion shit.
    Policing dissenting opinions and saying everything is equally worthy of praise are two sides of the same coin sliding in the vending machine that sells us the sad state of affairs we live in today.
    lifeformed20 days ago
    You can say whatever you want, but pretentious sneering is annoying, don't be surprised if people push back.
    jmye20 days ago
    You absolutely trashed it in your first sneering, shitty swipe about “culture”. You don’t get to make comments like that and then whine about “policing” like a four year-old caught in the cookie jar.
  - saaaaaam20 days ago
    Why isn't it?
pleurotus21 days ago
Super cool to read but can someone eli5 what Gaussian splatting is (and/or radiance fields?) specifically to how the article talks about it finally being "mature enough"? What's changed that this is now possible?
- meindnoch20 days ago
  1. Create a point cloud from a scene (either via lidar, or via photogrammetry from multiple images)
  2. Replace each point of the point cloud with a fuzzy ellipsoid, that has a bunch of parameters for its position + size + orientation + view-dependent color (via spherical harmonics up to some low order)
  3. If you render these ellipsoids using a differentiable renderer, then you can subtract the resulting image from the ground truth (i.e. your original photos), and calculate the partial derivatives of the error with respect to each of the millions of ellipsoid parameters that you fed into the renderer.
  4. Now you can run gradient descent using the differentiable renderer, which makes your fuzzy ellipsoids converge to something closely reproducing the ground truth images (from multiple angles).
  5. Since the ellipsoids started at the 3D point cloud's positions, the 3D structure of the scene will likely be preserved during gradient descent, thus the resulting scene will support novel camera angles with plausible-looking results.
  - klondike_klive20 days ago
    You... you must have been quite some 5 year old.
    efskap20 days ago
    ELI5 has meant friendly simplified explanations (not responses aimed at literal five-year-olds) since forever, at least on the subreddit where the concept originated.
    Now, perhaps referring to differentiability isn't layperson-accessible, but this is HN after all. I found it to be the perfect degree of simplification personally.
    SchemaLoad20 days ago
    Some things would be literally impossible to properly explain to a 5 year old.
    zapzupnz20 days ago
    If one actually tried to explain to a five year old, they can use things like analogy, simile, metaphor, and other forms of rhetoric. This was just a straight-up technical explanation.
    np_tedious20 days ago
    Lol. Def not for 5 year olds but it's about exactly what I needed
    How about this:
    Take a lot of pictures of a scene from different angles, do some crazy math, and then you can later pretend to zoom and pan the camera around however you want
    KeplerBoy20 days ago
    sure, but does that explanation really help anyone. Imo it might scare people off actually diving into things, the math isn't too crazy.
    20 days ago
    undefined
    zapzupnz18 days ago
    Anybody sufficiently interested would press further, not back away.
    efreak16 days ago
    Saying math (even using it in a dismissive tldr) is immensely helpful. Specifically, I've never encountered these terms before:
    - point cloud - fuzzy ellipsoid - view-dependent color - spherical harmonics - low order - differentiable renderer (what makes it differentiable? A renderer creates images, right?) - subtract the resulting image from the ground truth (good to know this means your original photos, but how do you subtract images from images?) - millions of ellipsoid parameters (the explanation previously mentioned 4 parameters by name. Where are the millions coming from?) - gradient descent (I've heard of this in AI, but usually ignore it because I haven't gotten deep enough into it to need to understand what it means) - 3D point cloud's positions (are all point clouds 3d? The point cloud mentioned earlier wasn't. Or was it? Is this the same point cloud?)
    In other words, you've explained this at far too high a level for me. Given that the request was for ELI5, I expected an explanation that I could actually follow, without knowing any specific terminology. Do disregard specifics and call it math. Don't just call it math and skip past it entirely: call it math and explain what you're actually doing with the math, rather than trying to explain the math you're doing; same for all the other words. If a technical term is only needed once in a conversation, then don't use it.
    Given that I actually do know what photogrammetry is at a basic level, I can make a best-effort translation here, but it's purely from 100% guessing rather than actually understanding:
    1. Create a 3d scan of a real-life scene or object. It uses radar (intentionally incorrect term, more familiar) or multiple photographs at different angles to see the 3 dimensional shape.
    2. For some reason, break up the stapes into smaller shapes.
    This is where my understanding goes to nearly 0:
    3-5: somehow, looking at the difference between a rendering of your 3d scene and a picture of the actual scene allows you to correct the errors in the 3d scene to make it more realistic. Using complex math works better and having the computer do it is less effort than manually correcting the models in your 3d scene.
  - alok-g20 days ago
    Thanks.
    How hard is it to handle cases where the starting positions of ellipsoids in 3D is not correct (being too off). How common is such a scenario with the state of the art? E.g., if having only a stereoscopic image pair, the correspondences are often not accurate.
    Thanks.
  - make319 days ago
    I assume that the differentiable renderer is only given its position and viewing angle at any one time (in order to be able to generalize to new viewing angles)?
    Is it a fully connected NN?
    meindnoch19 days ago
    No. There are no neural networks here. The renderer is just a function that takes a bunch of ellipsoid parameters and outputs a bunch of pixels. You render the scene, then subtract the ground truth pixels from the result, and sum the squared differences to get the total error. Then you ask the question "how would the error change if the X position of ellipsoid #1 was changed slightly?" (then repeat for all ellipsoid parameters, not just the X position, and all ellipsoids, not just ellipsoid #1). In other words, compute the partial derivative of the error with respect to each ellipsoid parameter. This gives you a gradient, that you can use to adjust the ellipsoids to decrease the error (i.e. get closer to the ground truth image).
  - renewiltord20 days ago
    Great explanation/simplification. Top quality contribution.
  - cpt_sobel20 days ago
    And what about the "mature enough" part? How has it changed / progressed recently?
    corysama20 days ago
    The field is advancing rapidly. New research papers are being published daily for a few years now. The best news feed I've found on the topic is
    https://radiancefields.com/
    https://x.com/RadianceFields alt: https://xcancel.com/RadianceFields
  - pleurotus20 days ago
    Thanks for the explanation!
  - chrisjj20 days ago
    Or: Matrix bullet time with more viewpoints and less quality.
  - 20 days ago
    undefined
- tel21 days ago
  Gaussian splatting is a way to record 3-dimensional video. You capture a scene from many angles simultaneously and then combine all of those into a single representation. Ideally, that representation is good enough that you can then, post-production, simulate camera angles you didn't originally record.
  For example, the camera orbits around the performers in this music video are difficult to imagine in real space. Even if you could pull it off using robotic motion control arms, it would require that the entire choreography is fixed in place before filming. This video clearly takes advantage of being able to direct whatever camera motion the artist wanted in the 3d virtual space of the final composed scene.
  To do this, the representation needs to estimate the radiance field, i.e. the amount and color of light visible at every point in your 3d volume, viewed from every angle. It's not possible to do this at high resolution by breaking that space up into voxels, those scale badly, O(n^3). You could attempt to guess at some mesh geometry and paint textures on to it compatible with the camera views, but that's difficult to automate.
  Gaussian splatting estimates these radiance fields by assuming that the radiance is build from millions of fuzzy, colored balls positioned, stretched, and rotated in space. These are the Gaussian splats.
  Once you have that representation, constructing a novel camera angle is as simple as positioning and angling your virtual camera and then recording the colors and positions of all the splats that are visible.
  It turns out that this approach is pretty amenable to techniques similar to modern deep learning. You basically train the positions/shapes/rotations of the splats via gradient descent. It's mostly been explored in research labs but lately production-oriented tools have been built for popular 3d motion graphics tools like Houdini, making it more available.
  - pleurotus20 days ago
    Thanks for the explanation! It makes a lot of sense that voxels would scale as badly as they do, especially if you want to increase resolution. Am I right in assuming that the reason this scales a lot better is because the Gaussian splats, once there's enough "resolution" of them, can provide the estimates for how light works reasonably well at most distances? What I'm getting at is, if I can see Gaussian splats vs voxels similarly to pixels vs vector graphics in images?
    tel20 days ago
    I think, yes, with greater splat density—and, critically, more and better inputs to train on, others have stated that these performances were captured with 56 RealSense D455fs—then splats will more accurately estimate light at more angles and distances. I think it's likely that during capture they had to make some choices about lighting and bake those in, so you might still run into issues matching lighting to your shots, but still.
    https://www.realsenseai.com/products/real-sense-depth-camera...
    That said, I don't think splats:voxels as pixels:vector graphics. Maybe a closer analogy would be pixels:vectors is the same as voxels:3d mesh modeling. You might imagine a sophisticated animated character being created and then animated using motion capture techniques.
    But notice where these things fall apart, too. SVG shines when it's not just estimating the true form, but literally is it (fonts, simplified graphics made from simple strokes). If you try to estimate a photo using SVG it tends to get messy. Similar problems arise when reconstructing a 3d mesh from real-world data.
    I agree that splats are a bit like pixels, though. They're samples of color and light in 3d (2d) space. They represent the source more faithfully when they're more densely sampled.
    The difference is that a splat is sampled irregularly, just where it's needed within the scene. That makes it more efficient at representing most useful 3d scenes (i.e., ones where there are a few subjects and objects in mostly empty space). It just uses data where that data has an impact.
  - MITSardine20 days ago
    Are meshes not used instead of gaussian splats only due to robustness reasons? I.e., if there were a piece of software that could reliably turn a colored point cloud into a textured mesh, would that be preferable?
    corysama20 days ago
    Photogrammetry has been around for a long time now. It uses pretty much the same inputs to create meshes from a collection of images of a scene.
    It works well for what it does. But, it's mostly only effective for opaque, diffuse, solid surfaces. It can't handle transparency, reflection or "fuzz". Capturing material response is possible, but requires expensive setups.
    A scene like this poodle https://superspl.at/view?id=6d4b84d3 or this bee https://superspl.at/view?id=cf6ac78e would be pretty much impossible with photogrammetry and very difficult with manual, traditional, polygon workflows. Those are not videos. Spin them around.
    dahart20 days ago
    It’s not only for robustness. Splats are volumetric and don’t have topology constraints, and both of those things are desirable. The volume capability is sometimes used for volume effects like fog and clouds, but it also gives splats a very graceful way to handle high frequency geometry - higher frequency detail than the capture resolution - that mesh photogrammetry can’t handle (hair, fur, grass, foliage, cloth, etc.). It depends entirely on the resolution of the capture, of course. I’m not saying meshes can’t be used to model hair or other fine details, they can obviously, but in practice you will never get a decent mesh out of, say, iPhone headshots, while splats will work and capture hair pretty well. There are hair-specific capture methods that are decent, but no general mesh capture methods that’ll do hair and foliage and helicopters and buildings.
    BTW I believe there is software that can turn point clouds into textured meshes reliably; multiple techniques even, depending on what your goals are.
    baxuz20 days ago
    Not everything can be represented by textured meshes used in traditional photogrammetry (think Google Street View)
    This includes sparse areas like fences, vegetation and the likes, but more importantly any material properties like reflections, specularity, opacity, etc.
    Here's a few great examples: https://superspl.at/view?id=cf6ac78e
    https://superspl.at/view?id=c67edb74
  - cubefox20 days ago
    > Gaussian splatting is a way to record 3-dimensional video.
    I would say it's a 3D photo, not a 3D video. But there are already extensions to dynamic scenes with movement.
    poly2it20 days ago
    See 4D splatting.
    tiborsaas20 days ago
    Brain dances!
- dmarcos21 days ago
  It’s a point cloud where each point is a semitransparent blob that can have a view dependent color: color changes depending on direction you look at them. Allowing to capture reflections, iridescence…
  You generate the point clouds from multiple images of a scene or an object and some machine learning magic
- KerrickStaley20 days ago
  This 2-minute video is a great intro to the topic https://www.youtube.com/watch?v=HVv_IQKlafQ
  I think this tech has become "production-ready" recently due to a combination of research progress (the seminal paper was published in 2023 https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/) and improvements to differentiable programming libraries (e.g. PyTorch) and GPU hardware.
- djeastm21 days ago
  For the ELI5, Gaussian splatting represents the scene as millions of tiny, blurry colored blobs in 3D space and renders by quickly "splatting" them onto the screen, making it much faster than computing an image by querying a neural net model like radiance fields.
  I'm not up on how things have changed recently
- krackers20 days ago
  https://aras-p.info/blog/2023/09/05/Gaussian-Splatting-is-pr... and for a visual demo of the result https://antimatter15.com/splat/
- ravedave520 days ago
  This is a REALLY good video explaining it. https://www.youtube.com/watch?v=eekCQQYwlgA
- rkuykendall-com21 days ago
  I found this VFX breakdown of the recent Superman movie to have a great explanation of what it is and what it makes possible: https://youtu.be/eyAVWH61R8E?t=232
  tl;dr eli5: Instead of capturing spots of color as they would appear to a camera, they capture spots of color and where they exist in the world. By combining multiple cameras doing this, you can make a 3D works from footage that you can then zoom a virtual camera round.
  - michaelrubloff20 days ago
    I also spoke to the vfx team from Superman on how they achieved the reconstructions! (I’m also the author for the Helicopter article here).
    https://radiancefields.com/gaussian-splatting-in-superman
  - 21 days ago
    undefined
tokymegz20 days ago
Hello! I’m Chris Rutledge, the post EP / cg supervisor at Grin Machine. Happy to answer any questions. Glad people are enjoying this video, was so fun to get to play with this technique and help break it into some mainstream production
- origamiarmy20 days ago
  Awesome work, incredibly well done! What was the process like for setting the direction on use of these techniques with Rakim? Were you basically just trusted to make something great or did they have a lot of opinions on the technicalities?
  - tokymegz19 days ago
    I didn’t interface much with Rocky outside of the shoot, our director Dan was talking to him regularly and he certainly had options and great ideas but mostly left it up to us. All of the creative came from Dan and bouncing ideas off us / trying things and seeing what was possible with this tech. By the end of the process it was awesome to get Dan also into blender helping set up camera moves himself, in addition to finessing the edit and animatic to help point us in the right direction.
- darhodester20 days ago
  Grin Machine knocked this out of the park!
  Great job, Chris and crew!
noman-land21 days ago
Really amazing video. Unfortunately this article is like 60% over my head. Regardless, I actually love reading jargon-filled statements like this that are totally normal to the initiated but are completely inscrutable to outsiders.
```
    "That data was then brought into Houdini, where the post production team used CG Nomads GSOPs for manipulation and sequencing, and OTOY’s OctaneRender for final rendering. Thanks to this combination, the production team was also able to relight the splats."
```
- darhodester21 days ago
  Hi, I'm one of the creators of GSOPs for SideFX Houdini.
  The gist is that Gaussian splats can replicate reality quite effectively with many 3D ellipsoids (stored as a type of point cloud). Houdini is software that excels at manipulating vast numbers of points, and renderers (such as Octane) can now leverage this type of data to integrate with traditional computer graphics primitives, lights, and techniques.
  - suzzer9921 days ago
    Can you put "Gaussing splats" in some kind of real world metaphor so I can understand what it means? Either that or explain why "Gaussian" and why "splat".
    I am vaguely aware of stuff like Gaussian blur on Photoshop. But I never really knew what it does.
    darhodester21 days ago
    Sure!
    Gaussian splatting is a bit like photogrammetry. That is, you can record video or take photos of an object or environment from many angles and reproduce it in 3D. Gaussians have the capability to "fade" their opacity based on a Gaussian distribution. This allows them to blend together in a seamless fashion.
    The splatting process is achieved by using gradient descent from each camera/image pair to optimize these ellipsoids (Gaussians) such that the reproduce the original inputs as closely as possible. Given enough imagery and sufficient camera alignment, performed using Structure from Motion, you can faithfully reproduce the entire space.
    Read more here: https://towardsdatascience.com/a-comprehensive-overview-of-g....
    sowbug20 days ago
    I think this means that you could produce more versions of this music video from other points of view without having to shoot the video again. For example, the drone-like effects could take a different path through the scene. Or you could move people/objects around and still get the lighting right.
    Given where this technology is today, you could imagine 5-10 years from now people will watch live sports on TV, but with their own individual virtual drone that lets them view the field from almost any point.
    joshvm20 days ago
    > I am vaguely aware of stuff like Gaussian blur on Photoshop. But I never really knew what it does.
    Blurring is a convolution or filter operation. You take a small patch of image (5x5 pixels) and you convolve it with another fixed matrix, called a kernel. Convolution says multiply element-wise and sum. You replace the center pixel with the result.
    https://en.wikipedia.org/wiki/Box_blur is the simplest kernel - all ones, and divide by the kernel size. Every pixel becomes the average of itself and its neighbors, which looks blurry. Gaussian blur is calculated in an identical way, but the matrix elements follow the "height" of a 2D Gaussian with some amplitude. It results in a bit more smoothing as farther pixels have less influence. Bigger the kernel, more blurrier the result.There are a lot of these basic operations:
    https://en.wikipedia.org/wiki/Kernel_(image_processing)
    If you see "Gaussian", it implies the distribution is used somewhere in the process, but splatting and image kernels are very different operations.
    For what it's worth I don't think the Wikipedia article on Gaussian Blur is particularly accessible.
    two_handfuls20 days ago
    > explain why "Gaussian" and why "splat".
    Happily. Gaussian splats are a technique for 3D images, related to point clouds. They do the same job (take a 3D capture of reality and generate pictures later from any point of view "close enough" to the original).
    The key idea is that instead of a bunch of points, it stores a bunch of semi-transparent blobs - or "splats". The transparency increases quickly with distance, following a normal distribution- also known as the "Gaussian distribution."
    Hence, "Gaussian splats".
    suzzer9919 days ago
    Somehow this hit right in the sweet spot at my level of knowledge. Thanks!
    shwaj21 days ago
    How can you expect someone to tailor a custom explanation, when they don’t know your level of mathematical understanding, or even your level of curiosity. You don’t know what a Gaussian blur does; do you know what a Gaussian is? How deeply do you want to understand?
    If you’re curious start with the Wikipedia article and use an LLM to help you understand the parts that don’t make sense. Or just ask the LLM to provide a summary at the desired level of detail.
    F7F7F721 days ago
    There's a Corridor Digital video being shared that explains it perfectly. With very little math.
    https://youtube.com/watch?v=cetf0qTZ04Y
    homebessguy20 days ago
    Amazing video, thanks for sharing this.
    suzzer9920 days ago
    > How can you expect someone to tailor a custom explanation, when they don’t know your level of mathematical understanding, or even your level of curiosity.
    The other two replies did a pretty good job!
  - 21 days ago
    undefined
- michaelrubloff20 days ago
  My bad! I am the author. Gaussian splatting allows you to take a series of normal 2D images or a video and reconstruct very lifelike 3D from it. It’s a type of radiance field, like NeRFs or voxel based methods like Plenoxels!
- pants221 days ago
  Corridor has done some great stuff with Gaussian Splats, I recommend this video for a primer!
  https://youtube.com/watch?v=cetf0qTZ04Y
- appplication20 days ago
  Reminds me of Kurtwood Smith’s piping sales pitch in The Patriot
GuB-4220 days ago
> many viewers focused on the chaos, the motion, and the unmistakable early MTV energy of the piece
It certainly moves around a lot!
It certainly looks like the tech and art style here are indissociable. Not only the use of Gaussian Splats made such extreme camera movement possible, one can be argued that it made them necessary.
Pause the video and notice the blurriness and general lack of details. But the frantic motion doesn't let the viewer focus on this details, most of them hidden by a copious amount of motion blur anyways.
To me it is typical of demos, both as in the "demoscene" and "tech demo" sense, where the art style is driven by the technology, insisting on what it enables, while at the same time working around its shortcomings. I don't consider it a bad thing of course, it leads to a lot of creativity and interesting art styles.
- whizzter20 days ago
  Kinda, it all depends on platforms and eras. As a demoscener I see more of techdemo, indiedev and "game-mods" vibes in much of it (especially certain kinds of jankiness feel of visuals in some parts that demosceners doing more high end in general try to avoid but is more meme-worthy in game-mods/indiedev circles). Demos often aim for a bit more clean visuals (and palettes), blurriness is often tuned in another way.
  Sadly, much of the demoscene is in a bit of a navel-grazing retro computing phase, many active ones today are "returners" from the C64 and Amiga eras whilst PC sceners of the 90s either dropped off for money, games or kids.
  It's also the sheer work-effort, demoscene in the 90s and early 00s could focus on rendering while visual art pipelines didn't matter as much, as graphics cards got better it was obvious that the scene was falling behind the cutting edge games (both in asset due to workload and hacks required for graphics cards to render realistically).
  The introduction and early popularization of SDF rendering turned the scene a bit more relevant again, but it's also been masking a certain lack of good artists since programmers could create nice renderings without needing assets.
  However, to match something like this video in creativity would require a lot of asset workload (and non-trival rendering), and that combo is not really that common today sadly.
  Funnily enough, I was actually discussing just Gaussian Splatting as a solution for more "asset heavy" demos about a year ago with another scener friend, but sadly there's a tad of a stigma culturally as NN/"AI" methods has been fairly controversial within the scene, aside from programmers there are both visual and music artists, and among those camps it's not really a popular thing.
  It's still mostly a method though, and SDF rendering + GS could in the end be a saviour in disguise for the scene to go beyond just rendering and bring back a bit more story-telling to the scene.
  - skrebbel20 days ago
    > but sadly there's a tad of a stigma culturally as NN/"AI" methods has been fairly controversial within the scene
    The scene has always been a ridiculously conservative bunch. Back when 3dfx was new, using 3d acceleration was similarly controversial. The Pouet comments were scary similar to those today. All we need is a few demos that actually use these technologies with great results (instead of for laziness/slop), and the majority opinion will shift as it always has.
    GuB-4220 days ago
    I consider the early years of 3d acceleration to be pretty bad days for the demoscene on the "high end" side.
    Problem was that fixed pipelines were seriously limiting. Most of the cool effects of software rendering couldn't be done, and the lack of direct access to the hardware meant that you couldn't do many of the hardware tricks the demoscene is known for. It doesn't mean people couldn't be creative, but it was mostly limited to doing interesting 3D geometry and textures. Things started changing with the advent of shaders.
    About AI, I think thing the demoscene is rather welcoming of "AI" as long as you use it creatively instead of doing what everyone else does (slop). In the topic of Gaussian splatting, look at the Revision 2025 invitation [1], there is a pretty cool scene featuring it, people loved that part.
    [1] https://www.pouet.net/prod.php?which=103537
rjh2921 days ago
To be honest it looks like it was rendered in an old version of Unreal Engine. That may be an intentional choice - I wonder how realistic guassian splatting can look? Can you redo lights, shadows, remove or move parts of the scene, while preserving the original fidelity and realism?
The way TV/movie production is going (record 100s of hours of footage from multiple angles and edit it all in post) I wonder if this is the end state. Gaussian splatting for the humans and green screens for the rest?
- darhodester21 days ago
  The aesthetic here is at least partially an intentional choice to lean into the artifacts produced by Gaussian splatting, particularly dynamic (4DGS) splatting. There is temporal inconsistency when capturing performances like this, which are exacerbated by relighting.
  That said, the technology is rapidly advancing and this type of volumetric capture is definitely sticking around.
  The quality can also be really good, especially for static environments: https://www.linkedin.com/posts/christoph-schindelar-79515351....
- clint20 days ago
  Several of ASAP's video have a lo-fi retro vibe, or specific effects such as simulating stuff like a mpeg a/v corruption, check out A$AP Mob - Yamborghini High (https://www.youtube.com/watch?v=tt7gP_IW-1w)
- F7F7F721 days ago
  Knowing what I know about the artist in this video this was probably more about the novelty of the technology and the creative freedom it offers rather than it is budget.
- TeMPOraL20 days ago
  For me it felt more like higher detail version of Teardown, the voxel-based 3d demolition game. Sure it's splats and not voxels, but the camera and the lighting give this strong voxel game vibe.
- moi238821 days ago
  Yes, they talk about this in the article and that’s exactly what they did.
  - rjh2920 days ago
    It wasn't clear to me how much this was intentional vs. being the limits of the technology at the moment.
    cubefox20 days ago
    I guess the technology still has some quality limitations, otherwise we would already see it in mainstream movies, e.g. to simulate smooth camera motions beyond what is achievable with video stabilization. It's much more difficult to achieve 4K quality that holds up on a movie theater screen, without visible artifacts, than to do an artistic music video.
    jofzar20 days ago
    https://youtu.be/eyAVWH61R8E?t=3m53s
    I would say Superman's quality didn't suffer for it.
    I would say cost is probably the most expensive part it's also just like "why bother", it's not CG, it's not "2d filming" so it's just niche, like the scenarios you would actually need this are very low.
    cubefox20 days ago
    That's interesting. 192 cameras is certainly expensive. Though they are doing 4DGS, with movement, so they have to capture every frame from different angles at the same time. I assume 3DGS for static environments (locations) would be a lot easier in terms of hardware. E.g. a single drone could collect photos for an hour and then they could create arbitrary simulated camera movements that couldn't be filmed conventionally. But again, the quality would have to be high in most cases. The nature of the Superman scene (some sort of hologram) is more forgiving, as it is inherently fake-looking, which helps excuse artifacts slipping through.
- WD-4220 days ago
  I wonder if you are thinking Source engine? I was getting serious skibidi toilet vibes during several parts of this video.
- michaelrubloff20 days ago
  We will be able to have imax level 3D technically today if you feed it the correct data
squidsoup20 days ago
Tangential, but I've been exploring gaussian splatting as a photographic/artistic medium for a while, and love the expressionistic quality of the model output when deprived of data.
https://bayardrandel.com/gaussographs/
- ablation19 days ago
  A very cool aesthetic and application of technology, thank you for sharing.
  - squidsoup19 days ago
    thank you!
- kroaton20 days ago
  Loving this; great work! Do you talk about the process anywhere in more depth?
  - squidsoup20 days ago
    Thanks! I'm using the KIRI Engine in Blender to render splats from my photos (https://github.com/Kiri-Innovation/3dgs-render-blender-addon) and then process the image as I would my photography in Lightroom. There are lots of different photogrammetry tools for generating plys (the point cloud) like PolyCam (https://poly.cam).
- darhodester20 days ago
  Cool aesthetic!
- HugoDz20 days ago
  Very cool!
daveofiveo21 days ago
Direct link to the music video: https://www.youtube.com/watch?v=g1-46Nu3HxQ
- dang21 days ago
  Good idea - we'll put that link in the toptext as well. Thanks!
roughly21 days ago
Be sure to watch the video itself* - it’s really a great piece of work. The energy is frenetic and it’s got this beautiful balance of surrealism from the effects and groundedness from the human performances.
* (Mute it if you don’t like the music, just like the rest of us will if you complain about the music)
- filoleg21 days ago
  Similarly, the music video for Taylor Swif[0] (another track by A$AP Rocky) is just as surrealistic and weird in the best way possible, but with an eastern european flavor of it (which is obviously intentional and makes sense, given the filming location and being very on-the-nose with the theme).
  0. https://youtu.be/5URefVYaJrA
  - prmoustache20 days ago
    I can see how this kind of videos can attract the tiktok addicts with less than 3 seconds of attention time.
    I wonder what will be the state of cinema/series/video clips in 30 years? Will singers/rappers give up sentences completely and just mention names of emojis? Will we have to use screens at 576hz to be able to watch acclerated videos without seeing a constant blur?
    I guess most kids from today would fall asleep before the end of the generic of Twin Peaks or the opening scene of Fargo.
    nkotov20 days ago
    I still rewatch this video once or twice a year. I think it's a classic in my opinion.
  - roughly20 days ago
    Holy shit that’s great. I need to check a few more of his videos.
    owlninja20 days ago
    A$AP Rocky has always put out interesting videos, here are a few others that are pretty cool:
    Yamborghini high: https://www.youtube.com/watch?v=tt7gP_IW-1w L$D: https://www.youtube.com/watch?v=Gx4JEBwVlXo
- superjan20 days ago
  Watch the video to the very end: the final splat is not a gaussian one.
diffuse_l20 days ago
Too bad, but I managed to watch about 30 seconds of the video before getting motion sickness.
Seems like a really cool technology, though.
I wonder if anyone else got the same response, or it's just me.
- darhodester20 days ago
  My wife said the same thing, but it gets better after the intro.
- lwhi20 days ago
  I loved the video. Didn't get the motion sickness myself.
- andy9920 days ago
  Yes, just clicked on the video, instantly nauseous. I get motion sick generally.
- mikkupikku20 days ago
  Same. It's very cool but left me badly motion sick.
periodjet20 days ago
The end result is really interesting. As others have pointed out, it looks sort of like it was rendered by an early 2000s game engine. There’s a cohesiveness to the art direction that you just can’t get from green screens and the like. In service of some of the worst music made by human brains, but still really cool tech.
bininunez20 days ago
Hello! I'm Ben Nunez, CEO at Evercoast. Our software was used to capture and reconstruct the 4D Gaussian splats in this A$AP Rocky video.
The music video is a mix of creative and technical genius from several different teams, and it's ultimately a byproduct of building tooling to capture reality once and reuse it downstream.
There’s a lot more to explore here. Once you have grounded 4D human motion and appearance tied to a stable world coordinate system, it becomes a missing primitive for things like world models, simulation, and embodied AI, where synthetic or purely parametric humans tend to break down.
nelkazzu20 days ago
It’s interesting to see Gaussian splatting show up in a mainstream music video this quickly. A year ago it was mostly a research demo, and now artists are using it as part of their visual toolkit. What I find most notable is how well splats handle chaotic motion like helicopter shots — it’s one of the few 3D reconstruction methods that doesn’t completely fall apart with fast movement. Feels like we’re going to see a lot more of this in creative work before it shows up in anything “serious”.
iamleppert20 days ago
The texture of Gaussian Splatting always looks off to me. It looks like the entire scene has been textured or has a bad, uniform film grain filter to me. Everything looks a little off in an unpleasing way -- things that should be sharp are aren't, and things that should be blurry are not. It's uncanny valley and not in a good way. I don't get what all the rage is about it and it always looks like really poor B-roll to me.
rubzah20 days ago
Oh wow, somehow I was not aware of how capable this technology has become, looks like a major game changer, across many fields.
In the near term, it could be very useful for sports replays. The UFC has this thing where they stitch together sequences of images from cameras all around the ring, to capture a few seconds of '360 degree' video of important moments. It looks horrible, this would be a huge improvement.
jtolmar21 days ago
Dang, it's been cool watching gaussian splats go from tech demo to real workflow.
- darhodester21 days ago
  For sure!
21 days ago
undefined
Centigonal19 days ago
A$AP Mob has some really great music videos. They're usually not the first to adopt a new technology, but they love to push the envelope and popularize fringe techniques.
The Yamborghini High music video from 2016 did some really cool stuff with datamoshing and hue shifting: https://www.youtube.com/watch?v=tt7gP_IW-1w
Findecanor20 days ago
I can't really respect the artist though, after the assault on a random bystander in Stockholm in 2019 — for which he was convicted. He got off too easy.
I would have refused to work on this.
hbarka20 days ago
> Viewers assume the imagery is AI-generated.
Watching this Helicopter music video made me recall a scene in Money for Nothing by Dire Straits, which was famously the first music video to air on MTV Europe (when the MTV phenomenon launched the 80s). It used 3D animation for the human characters and was considered groundbreaking animation in that time. The irony is we knew it was computer generated but now human generated is indistinguishable from AI.
millitzer20 days ago
You could also pull a Michel Gondry and do it with practical effects. https://www.youtube.com/watch?v=s5FyfQDO5g0&list=RDs5FyfQDO5...
e920 days ago
I was wondering how these kind of scenes were done in the movie Enter The Void back in 2009. Maybe they used different technique but looks very similar: https://vimeo.com/60239415
IndySun19 days ago
Where does this tech originate, creatively? I'm especially thinking about a 1990s or 2000s Rolling Stones pop video that seemed to 'mess with time' but (by and large) in 2d. Had a 'drunk' look to it.
karol20 days ago
The rap seems rather derivative, lyrics contain mandatory token "pussy".
- mikkupikku20 days ago
  All art is derivative. I don't care for this music, but being derivative isn't a real criticism.
- defrost20 days ago
  It's a post about the visuals not the libretto.
  If you're looking for bars sans pussy, read on: https://www.youtube.com/watch?v=yKifJ4Q5ph0
MuffinFlavored21 days ago
How did Rhianna look him in the eyes and say "yes babe, good album, release it, this is what the people wanted after 7 years, it is pleasing to listen to and enjoyable"?
- larsmaxfield21 days ago
  I prefer when artists make music they intrinsically want to make — not what others want them to make.
  - b00ty4breakfast21 days ago
    the real question is how much of the art is their own and how much is outside expectations and their reactions to it.
    And it's not always giving in to those voices, sometimes it's going in the opposite direction specifically to subvert those voices and expectations even if that ends up going against your initial instincts as an artist.
    With someone like A$AP Rocky, there is a lot of money on the line wrt the record execs but even small indie artists playing to only a hundred people a night have to contend with audience expectation and how that can exert an influence on their creativity.
- perfmode21 days ago
  It seems the numerous leaks and trials took their toll.
  I don’t disagree with you—I felt “Tailor Swif,” “DMB,” and “Both Eyes Closed” were all stronger than the tracks that made it onto this album.
  But sometimes you’ve gotta ship the project in the state it’s in and move on with your life.
  Maybe now he can move forward and start working on something new. And perhaps that project will be stronger.
- delbronski21 days ago
  Im sure it was more like, “hey babe, can I get a few millions to go in the studio and experiment/make some art?” And then she was like, “yeah go for it! Make some weird shit.”
  If I was in his position I’d probably be doing the same. Why bother with another top hit that pleases the masses.
- weakfish21 days ago
  Because it was awesome. But also, leaks probably.
Footprint052120 days ago
They really said it’s capturing everything when A$AP Rocky’s Gaussian splatted mouth in that video be looking worse than AI generated video lol
badcryptobitch21 days ago
Both of my worlds are colliding with this article. I love reading about how deeply technical products/artifacts get used in art.
drdirk21 days ago
Can somebody explain to me what was actually scanned? Only the actors doing movements like push ups, or whole scenes / rooms?
GrowingSideways20 days ago
The splatting seems to be video only but I could be wrong.
It's only a matter of time until the #1 hit figures out how to make this work
narrator21 days ago
This reminds me about how Soulja Boy just used a cracked copy of Fruity Loops and a cheap microphone and recorded all his songs that made him millions.[1] Edit: Ok this was a big team of VFX producers who did this. Still, prices are coming down dramatically in general, but yeah that idea is a bit of an underfit to this case.
[1] https://www.youtube.com/watch?v=f1rjhVe59ek
- enneff21 days ago
  You might consider why this article which has nothing to do with AI as you know it (except for the machine learning aspects of Gaussian splatting), and was produced by a huge team of vfx professionals, has made you think about AI democratising culture (despite the fact that music videos and films have been cheap to make for decades). Don’t just look for opportunities to discuss your favourite talking points.
  - narrator21 days ago
    Fair point actually. touché.
- comradesmith21 days ago
  I really don’t see the connection. A$AP isn’t a noob
  - 21 days ago
    undefined
- jojobas20 days ago
  The whole rap/hip-hop scene got jump-started by 1977 NYC blackout electronics store lootings.
profsummergig20 days ago
So sad that nobody thought it important to ELI5 whatever on earth "gaussian splatting" means, and how it's different than regular splatting (if there's such a thing), or regular video. To me the video looks like the figures have slightly rounder edges, that's all.
- jofzar20 days ago
  Say you have a photo, but you want to be able to explore it in 3d so you put it into fortnite but when you move you can't see behind objects because the photo never "saw" behind object.
  So you decide to take lots and lots of photos at every single angle possible, but you need a way to link these all together, so you decide that each Centrepoint of the image is a "gaussian". These splat everywhere.
  Now you have taken all of these photos and you can now explore the image in fortnite because you took thousands of images of every possible view!
  But what if you didn't want to just look at the frozen image in a landscape in fortnite, instead you wanted to use a man dancing in your new upcoming YouTube video called helicopter.
  If you isolate this person (let's say taking all these photos on a green screen) you now have a 3d like recording model, you can reshoot and "scene" on-top of something else (like a 3d diarama like in your video!)
- 20 days ago
  undefined
- tomaytotomato20 days ago
  Same, I found this video explainer here:
  https://www.youtube.com/watch?v=Tnij_xHEnXc
  Whenever I see Gaussian, I think of the Gauss gun from Half Life 2
erratic_chargi20 days ago
First Rocky is a redditor now he on HN, this rollout is weird.
moribvndvs21 days ago
In another setting, it looks like ass, but lo-fi, glitchy shit is perfectly compatible with hip-hop aesthetic. Good track though.
- henshao21 days ago
  I think in 2026 it's hard to make a video look this "bad" without it being a clear aesthetic choice, so not sure you could find this video in another setting.
- ungreased067521 days ago
  The technology is impressive, but the end result… Weapons-grade brainrot.
  I’m curious what other artists end up making with it.
  - SamBam20 days ago
    I really disagree with the label brainrot. Brainrot is low-quality garbage with no artistic merit, and very little thought behind its creation, which does nothing but make you briefly pause while scrolling, before scrolling away with no lasting impression being done to your mind (besides increased boredom and inability to focus).
    This is clearly an artistic statement, whether you like the art or not. A ton of thought and time was put into it. And people will likely be thinking and discussing this video for some time to come.
    ungreased067520 days ago
    I think I have a different definition of brainrot than you. I think of attention grabbing visuals that have no meaning, just something to keep your eyes glued to the screen.
    A good example are those Subway Surfers split screen videos, where someone is babbling about nothing in one frame, but the visuals in the other keep people watching.
    Another example is AI-narrated “news” on YouTube. Nobody would normally listen to an AI voice read AI slop, but if there are some extreme video clips quickly switching every few seconds, people don’t immediately click away.
    Brain rot shreds the attention span and uses all kinds of psychological tricks to keep people engaged. In the Helicopter video, every second is packed with visual information, not to contribute to the narrative but to capture attention. The backgrounds are full of details. The camera never stops moving. The subjects depicted are even attention grabbing: police lights, dancing people, guns, car crashes, flamethrowers! Hey, does that guy have pink curlers in his hair?
    It’s not that I don’t like it (I kinda do), but a media diet of that kind of content is bad for the brain.
  - F7F7F721 days ago
    The article said it was by design...
jeffsipko20 days ago
Hi all! If you watched the video and thought you'd love to make stuff like that, but then you looked at the cost of a volumetric capture and nope'ed out, we should talk!
I've been developing a solution to make the cost of 4D capture an order of magnitude cheaper by using a small number of off-the-shelf cameras. Here's the proof-of-concept demo using 4x GoPros: https://youtube.com/shorts/Y56l0FlLlAg (yes, lots of room to improve quality). You can also see the interactive version (with XR support) at https://gaussplay.lovable.app
lawgimenez20 days ago
A$AP Rocky’s music videos has been always great.
weinzierl21 days ago
"The team also used Blender heavily for layout and previs, converting splat sequences into lightweight proxy caches for scene planning."
jimbo80820 days ago
A$ap Rocky's music videos have some really good examples of how AI can be used creatively and not just to generate slop. My favorite is Taylor Swif, it's a super fun video to watch.
https://www.youtube.com/watch?v=5URefVYaJrA
yieldcrv21 days ago
so basically despite the higher resource requirements like 10TB of data for 30 minutes of footage, the compositing is so much faster and more flexible and those resources can be deleted or moved to long term storage in the cloud very quickly and the project can move on
fascinating
I wouldn't have normally read this and watched the video, but my Claude sessions were already executing a plan
the tl;dr is that all the actors were scanned in a 3D point cloud system and then "NeRF"'d which means to extrapolate any missing data about their transposed 3D model
this was then more easily placed into the video than trying to compose and place 2D actors layer by layer
- darhodester21 days ago
  Gaussian splatting is not NeRF (neural radiance field), but it is a type of radiance field, and supports novel view synthesis. The difference is in an explicit point cloud representation (Gaussian splatting), versus a process that needs to be inferred by a neural network.
  - IshKebab20 days ago
    It's not a type of radiance field.
    michaelrubloff20 days ago
    It’s literally the name of gaussian splatting. 3D Gaussian Splatting for Real Time Radiance Fields
    https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
    IshKebab20 days ago
    Hmm if gaussian splatting is radiance field rendering then so is any 3D rendering, and what's the point of using the name? Though having looked up the name it seems like it isn't well defined enough to mean much anyway tbh.
- andybak21 days ago
  > and then "NeRF"'d which means to extrapolate any missing data about their transposed 3D model
  Not sure if it's you or the original article but that's a slightly misleading summary of NeRFs.
  - yieldcrv21 days ago
    I'm all for the better summary
firecall20 days ago
It would be super cool if these articles or websites actually explained anywhere what volumetric capture and radiance fields actually are!
renewiltord20 days ago
Wow, back to music videos. It’s been years. This is a great one.
hackable_sand20 days ago
Good shit
aidenn020 days ago
Maybe I'm getting old, but I can't watch for more than about 5 seconds without getting a headache.
nedjdkdkdk20 days ago
[dead]
darig20 days ago
[dead]
KevinMS20 days ago
aren't music videos supposed to have music?
- hahahahhaah20 days ago
  it does, it just doesn't have video
sneak21 days ago
> One recurring reaction to the video has been confusion. Viewers assume the imagery is AI-generated. According to Evercoast, that couldn’t be further from the truth. Every stunt, every swing, every fall was physically performed and captured in real space. What makes it feel synthetic is the freedom volumetric capture affords.
No, it’s simply the framerate.
londons_explore21 days ago
Pretty sure most of this could be filmed with a camera drone and preprogrammed flight path...
Did the Gaussian splatting actually make it any cheaper? Especially considering that it needed 50+ fixed camera angles to splat properly, and extensive post-processing work both computationally and human labour, a camera drone just seems easier.
- hamburglar21 days ago
  > Pretty sure most of this could be filmed with a camera drone and preprogrammed flight path
  This is a “Dropbox is just ftp and rsync” level comment. There’s a shot in there where Rocky is sitting on top of the spinning blades of a helicopter and the camera smoothly transitions from flying around the room to solidly rotating along with the blades, so it’s fixed relative to rocky. Not only would programming a camera drone to follow this path be extremely difficult (and wouldn’t look as good), but just setting up the stunt would be cost prohibitive.
  This is just one example of the hundreds you could come up with.
  - mlyle20 days ago
    Drones and 2d compositing could do a lot. They would excel in some areas used in the video, require far more resources than this technique in others, and be completely infeasible on a few.
    They would look much better in a very "familiar" way. They would have much less of the glitch and dynamic aesthetic that makes this so novel.
- darhodester21 days ago
  A drone path would not allow for such seamless transitions, never mind the planning required to nail all that choreography, effects, etc.
  This approach is 100% flexible, and I'm sure at least part of the magic came from the process of play and experimentation in post.
- nebezb21 days ago
  If it was achievable, cheaper, and of equal quality then it would have been done that way. Surely it would’ve been done that way a long time ago too. Drone paths have been around a lot longer than this technology.
  There’s no proof of your claim and this video is proof of the opposite.
- larsmaxfield21 days ago
  Flying a camera drone with such proximity and acceleration would be a safety nightmare.
- ex-aws-dude21 days ago
  I think you’re missing the point
  Volumetric capture like this allows you to decide on the camera angles in post-production
- ThouYS21 days ago
  it gives you flexibility, options
- echelon21 days ago
  It's fucking cool. That's why.
  This tech is moving along at breakneck pace and now we're all talking about it. A drone video wouldn't have done that.
- F7F7F721 days ago
  This might be the first time I'm stumbling on Dunning Kruger on HN, no offense.
  - gopher_space20 days ago
    Check any article dealing with education or labor.
    levroly20 days ago
    Im heck