The iPhone 15 Pro’s Depth Maps(tech.marksblogg.com)

348 pointsby marklit2 days ago19 comments

Uncorrelated2 days ago
Other commenters here are correct that the LIDAR is too low-resolution to be used as the primary source for the depth maps. In fact, iPhones use four-ish methods, that I know of, to capture depth data, depending on the model and camera used. Traditionally these depth maps were only captured for Portrait photos, but apparently recent iPhones capture them for standard photos as well.
1. The original method uses two cameras on the back, taking a picture from both simultaneously and using parallax to construct a depth map, similar to human vision. This was introduced on the iPhone 7 Plus, the first iPhone with two rear cameras (a 1x main camera and 2x telephoto camera.) Since the depth map depends on comparing the two images, it will naturally be limited to the field of view of the narrower lens.
2. A second method was later used on iPhone XR, which has only a single rear camera, using focus pixels on the sensor to roughly gauge depth. The raw result is low-res and imprecise, so it's refined using machine learning. See: https://www.lux.camera/iphone-xr-a-deep-dive-into-depth/
3. An extension of this method was used on an iPhone SE that didn't even have focus pixels, producing depth maps purely based on machine learning. As you would expect, such depth maps have the least correlation to reality, and the system could be fooled by taking a picture of a picture. See: https://www.lux.camera/iphone-se-the-one-eyed-king/
4. The fourth method is used for selfies on iPhones with FaceID; it uses the TrueDepth camera's 3D scanning to produce a depth map. You can see this with the selfie in the article; it has a noticeably fuzzier and low-res look.
You can also see some other auxiliary images in the article, which use white to indicate the human subject, glasses, hair, and skin. Apple calls these portrait effects mattes and they are produced using machine learning.
I made an app that used the depth maps and portrait effects mattes from Portraits for some creative filters. It was pretty fun, but it's no longer available. There are a lot of novel artistic possibilities for depth maps.
- heliographe2 days ago
  > but apparently recent iPhones capture them for standard photos as well.
  Yes, they will capture them from the main photo mode if there’s a subject (human or pet) in the scene.
  > I made an app that used the depth maps and portrait effects mattes from Portraits for some creative filters. It was pretty fun, but it's no longer available
  What was your app called? Is there any video of it available anywhere? Would be curious to see it!
  I also made a little tool, Matte Viewer, as part of my photo tool series - but it’s just for viewing/exporting them, no effects bundled:
  https://apps.apple.com/us/app/matte-viewer/id6476831058
  - lxgr2 days ago
    > Yes, they will capture them from the main photo mode if there’s a subject (human or pet) in the scene.
    One of the example pictures on TFA is a plant. Given that, are you sure iOS is still only taking depth maps for photos that get the "portrait" icon in the gallery? (Or have they maybe expanded the types of possible portrait subjects?)
    heliographe2 days ago
    It will capture the depth map and generate the semantic mattes (except in some edge cases) no matter the subject if you explicitly set the camera in Portrait mode, which is how I would guess the plant photo from the article was captured.
    My previous comment was about the default Photo mode.
    If you have a recent iPhone (iPhone 15 or above iirc) try it yourself - taking a photo of a regular object in the standard Photo mode won’t yield a depth map, but one of a person or pet will. Any photo taken from the Portrait mode will yield a depth map.
    You can find out more about this feature by googling “iPhone auto portrait mode”.
    Apple’s documentation is less helpful with the terminology; they call it “Apply the portrait effect to photos taken in Photo mode”
    https://support.apple.com/guide/iphone/edit-portrait-mode-ph...
    wodenokoto2 days ago
    Seems crazy to run an object recognition algorithm in order to decide if depths should be recorded.
    I’d thought that would be heavier than just record the depths.
    brooksta day ago
    Probably a pretty light classifier on the NPU. Doesn’t even have to care about what particular object it is, just if it matches training data for “capture depth map”.
    tougha day ago
    there was recently a 64 gates NN implementation in C shared on HN that was interesting for stuff like this
    lxgr2 days ago
    > [...] if you explicitly set the camera in Portrait mode, which is how I would guess the plant photo from the article was captured.
    Ah, that makes sense, thank you!
- snowdrop2 days ago
  For method 3 that article is 5 years old, see: https://github.com/apple/ml-depth-pro?tab=readme-ov-file
- oxym0ron2 days ago
  One thing worth noting: LIDAR is primarily optimized for fast AF and low-light focusing rather than generating full-res depth maps.
- tgma2 days ago
  Can method 4 be used by a security application to do liveness detection?
  - Someone2 days ago
    I think so. https://developer.apple.com/documentation/avfoundation/captu..., https://developer.apple.com/documentation/avfoundation/captu... show how to get a depth map.
  - jjcob2 days ago
    That's what FaceID does.
    tgma2 days ago
    Obviously -- that is not my question though. I was curious if that data is exposed via API or within the image for front camera as well, so a third party app can do it.
    tougha day ago
    most kyc apps have you record live video, i'm assuming they can then infer those depth maps from the video source regardless of your phone capabilities?
    tgma20 hours ago
    I bet most of them are just dumb and video based as they work on both Android and iOS.
caseyohara2 days ago
Cool article. I assume these depth maps are used for the depth of field background blurring / faux bokeh in "Portrait" mode photos. I always thought it was interesting you can change the focal point and control the depth of field via the "aperture" after a photo is taken, though I really don't like the look of the fake bokeh. It always looks like a bad photoshop.
I think there might be a few typos of the file format?
- 14 instances of "HEIC"
- 3 instances of "HIEC"
- mcdeltat17 hours ago
  On the point of fake bokeh, as a photographer I can't stand it. It looks horribly unnatural and nothing like bokeh from a good lens. Honestly astounding people think it looks good. If you want a pretty portrait, just buy/borrow a cheap DSLR and the resulting image will be 100x better.
- marklit2 days ago
  Fixed those. Cheers for pointing them out.
- dheera2 days ago
  I think the reason it looks fake is because they actually have the math wrong about how optics and apertures work, and they make some (really bad) approximations but from a product standpoint can please 80% of people.
  I could probably make a better camera app with the correct aperture math, I wonder if people would pay for it or if mobile phone users just wouldn't be able to tell the difference and don't care.
  - lcrs2 days ago
    There are a few projects now that simulate defocus properly to match what bigger (non-phone camera) lenses do - I hope to get back to working on it this summer but you can see some examples here: https://x.com/dearlensform
    Those methods come from the world of non-realtime CG rendering though - running truly accurate simulations with the aberrations changing across the field on phone hardware at any decent speed is pretty challenging...
  - dylan6042 days ago
    most people just want to see blurry shit in the background and think it makes it professional. if you really want to see it fall down, put things in the foreground and set the focal point somewhere in the middle. it'll still get the background blurry, but it gets the foreground all wrong. i'm guessing the market willing to pay for "better" faked shallow depth of field would be pretty small.
    vanviegena day ago
    > and think it makes it professional.
    That's a bit cynical. Blurring the background can make the foreground object stand out more, objectively (?) improving the photo in some cases.
    skhr068019 hours ago
    I get what he means. The gold standards for professional “bokeh” portraits, a 85mm f/1.4 prime, is typically a $1000-2000 lens.
    dylan604a day ago
    [flagged]
    dheera2 days ago
    Yeah that's why I didn't write the app already. I feel like the people who want "better faked depth" usually just end up buying a real camera.
    dylan6042 days ago
    Lytro had dedicated cameras and inferior resolution so they failed to gain enough traction to stay viable. You might have a better chance being that it's still on the same device, but the paid for app would be a push.
    However, you could just make the app connect to localhost and hoover up the user's data to monetize and then offer the app for free. That would be much less annoying than showing an ad at launch or after every 5 images taken. Or some other scammy app dev method of making freemium apps successful. Ooh, offer loot boxes!!!
    2 days ago
    undefined
    tene80i2 days ago
    Sample of one, but I’m interested. I used to use a real camera and now very rarely do. But I also often find the iPhone blurring very fake and I’ve never understood why. I assumed it was just impossible to do any better, given the resources they throw at the problem. If you could demonstrate the difference, maybe there would be a market, even if just for specific use cases like headshots or something.
  - semidror2 days ago
    Would it be possible to point out more details about where Apple got the math wrong and which inaccurate approximations they use? I'm genuinely curious and want to learn more about it.
    dheera2 days ago
    It's not that they deliberately made a math error, it's that it's a very crude algorithm that basically just blurs everything that's not within what's deemed as the subject with some triangular, Gaussian, or other computationally simple kernel.
    What real optics does:
    - The blur kernel is a function of the shape of the aperture, which is typically circular at wide aperture and hexagonal at smaller aperture. Not gaussian, not triangular, and the kernel being a function of the depth map itself, it does not parallelize efficiently
    - The blurring is a function of the distance to the focal point, is typically closer to a hyperbola; most phone camera apps just use a constant blur and don't even account for this
    - Lens aberrations, which are often thought of as defects, but if you generate something too perfect it looks fake
    - Diffraction effects happen at sharp points of the mechanical aperture which create starbursts around highlights
    - When out-of-focus highlights get blown out, they blow out more than just the center area, they also blow out some of the blurred area. If you clip and then blur, your blurred areas will be less-than-blown-out which also looks fake
    Probably a bunch more things I'm not thinking of but you get the idea
    jjcob2 days ago
    The iPhone camera app does a lot of those things. The blur is definitely not a Gaussian blur, you can clearly see a circular aperture.
    The blurring is also a function of the distance, it's not constant.
    And blowouts are pretty convincing too. The HDR sources probably help a lot with that. They are not just clipped then blurred.
    Have you ever looked at an iPhone portrait mode photo? For some subjects they are pretty good! The bokeh is beautiful.
    The most significant issue with iPhone portrait mode pictures are the boundaries that look bad. Frizzy hair always ends up as a blurry mess.
    qingcharles2 days ago
    The Adobe one has a pretty decent ML model for picking out those stray hairs and keeping them in focus. They actually have two models, a lower quality one that keeps things on device and a cloud one that is more advanced.
    qingcharles2 days ago
    Any ideas what the Adobe algorithm does? It certainly has a bunch of options for things like the aperture shape.
    xeonmc2 days ago
    re: parallelization, could a crude 3Dfft-based postprocessing achieve a slightly improved result relative to the current splat-ish approach while still being a fast-running approximation?
    i.e. train a very small ML model on various camera parameters vs resulting reciprocal space transfer function.
    2 days ago
    undefined
    semidror2 days ago
    Thanks!
  - vanviegena day ago
    I'm pretty happy with the results my Pixel produces (apart from occasional depth map errors). Is Google doing a better job than Apple with the blurring, or am I just blissfully ignorant? :-)
  - willseth2 days ago
    If it's all done in post anyway, then it might be a lot simpler to skip building a whole camera app and just give people a way to apply more accurate bokeh to existing photos. I would pay for that.
- 2 days ago
  undefined
- 2 days ago
  undefined
andrewmcwatters2 days ago
There’s Reality Composer for iOS which has a LIDAR-enabled specific feature allowing you to capture objects. I was bummed to find out that on non-LIDAR equipped Apple devices it does not in fact fall back to photogrammetry.
Just in case you were doing 3d modeling work or photogrammetry and wanted to know, like I was.
- H3X_K1TT3N2 days ago
  I've had the most success doing 3d scanning with Heges. The LiDAR works pretty well for large objects (like cars), but you can also use the Face ID depth camera to capture smaller objects.
  I did end up getting the Creality Ferret SE (via TikTok for like $100) for scanning small objects, and it's amazing.
  - tecleandor2 days ago
    Oh! $100 is a great price. I always see it at around $300-350 and I haven't bought it...
    H3X_K1TT3N2 days ago
    I take it back; I double checked and it was more like $180. Still worth it IMO.
  - klaussilveira2 days ago
    Does it scan hard surfaces pretty well, or does it mangle the shapes? Think car parts.
    H3X_K1TT3Na day ago
    I did a test scan on a small object (1:12 scale doll), and it seems to capture fine details really well, but I can't say for sure that it would be suitable for things like car parts; you would have to try it and see I suppose.
- WalterGR2 days ago
  Polycam does fall back.
  I’ve also heard good things about Canvas (requires LiDAR) and Scaniverse (LiDAR optional.)
  - zevon2 days ago
    I've had pretty good success with https://3dscannerapp.com - it's mostly intended for people with access to iDevices with LiDAR and an Apple Silicon Mac and in this combination can work completely offline by capturing via the iDevice and doing the processing on the Mac (using the system API for photogrammetry). AFAIK, there are also options for using just photos without LiDAR data and for cloud processing but I've never tried those.
  - andrewmcwatters2 days ago
    I’d really like to use Polycam, but it’s unclear what features are free and what’s paid.
    I’d be fine with paying for it, but it’s clear that they want to employ basic dark patterns and false advertising.
heliographe2 days ago
Yes, those depth maps + semantic maps are pretty fun to look at - and if you load them into a program like TouchDesigner (or Blender or Cinema 4D whatever else you want) you can make some cool little depth effects with your photos. Or you can use them for photographic processing (which is what Apple uses them for, ultimately)
As another commenter pointed out, they used to be captured only in Portrait mode, but on recent iPhones they get captured automatically pretty much whenever a subject (human or pet) is detected in the scene.
I make photography apps & tools (https://heliographe.net), and one of the tools I built, Matte Viewer, is specifically for viewing & exporting them: https://apps.apple.com/us/app/matte-viewer/id6476831058
onlygoose2 days ago
LIDAR itself has much much lower resolution that the depth maps shown. It has to be synthesized from combined LIDAR and regular camera data.
- mackman2 days ago
  Yeah I thought LIDAR was used for actual focus and depth map was then computed from the multi-camera parallax.
kccqzy2 days ago
I might be missing something here but the article spends quite a bit discussing the HDR gain map. Why is this relevant to the depth maps? Can you skip the HDR gain map related processing but retain the depth maps?
FWIW I personally hate the display of HDR on iPhones (they make the screen brightness higher than the maximum user-specified brightness) and in my own pictures I try to strip HDR gain maps. I still remember the time when HDR meant taking three photos and then stitching them together while removing all underexposed and overexposed parts; the resulting image doesn't carry any information about its HDR-ness.
- jasongill2 days ago
  I thought the same about the article and assumed I had just missed something - it seemed to have a nice overview of the depth maps but then covered mostly the gain maps and some different file formats. Good article, just a bit of a meandering thread
- twoodfina day ago
  Note you can turn off the display-enhanced HDR in Photos settings.
  - kccqzya day ago
    That only does it in the Photos app. What about online in WebViews? What about third party apps like Instagram? The only surefire way of turning it off everywhere is low power mode.
praveen9920a day ago
I am waiting for a day when all phone hardwares defaulting to Gaussian splatting to take 3d images without expensive sensors. It may be computationally expensive but probably cheaper than adding expensive sensors and adding more weight.
kawsper2 days ago
Aha! I wonder if Apple uses this for their “create sticker” feature, where you press a subject on an image and can extract it to a sticker, or copy it to another image.
- lxgr2 days ago
  That must be the ML-only approach, since it works even on photos not taken on an iPhone.
arialdomartini2 days ago
Just wonder if depth maps can be used to generate stereograms or SIRDS. I remember having playing with stereogram generation starting from very similar grey-scaled images.
- kridsdale32 days ago
  They do. The UI to do this is apparently only included in the VisionOS version of the Photos app. But you can convert any photo in your album to "Spatial Format" as long as it has a Depth Map, or is high enough resolution for the ML approximation to be good enough.
  It also reads EXIF to "scale" the image's physical dimensions to match the field of view of the original capture, so wide-angle photos are physically much larger in VR-Space than telephoto.
  In my opinion, this button and feature alone justifies the $4000 I spent on the device. Seeing photos I took with my Nikon D7 in 2007, in full 3D and correct scale, triggers nostalgia and memories I've forgotten I had for many years. It was quite emotional.
  Apple is dropping the ball on not making this the primary selling-point of Vision Pro. It's incredible.
ziofill2 days ago
Every time I glance at the title my brain reads “death maps”
itsgrimetime2 days ago
site does something really strange on iOS chrome - when I scroll down on the page the font size swaps larger, when I scroll up it swaps back smaller. Really disorienting
Anyways, never heard of oiiotool before! Super cool
cloud_herder2 days ago
Off the topic at hand but this site is elegantly simple... I wonder what static site generator he uses?
- jonathonlui2 days ago
  At the bottom of page:
  > Copyright © 2014 - 2025 Mark Litwintschik. This site's template is based off a template by Giulio Fidente [1]
  The theme is for the Pelican [2] static site generator.
  [1] https://github.com/gfidente/pelican-svbhack
  [2] https://getpelican.com/
- 2 days ago
  undefined
layer82 days ago
You can make autostereograms from those.
just-working2 days ago
Cool article. I read the title as 'Death Maps' at first though.
- heraldgeezer2 days ago
  Me too! I wanted a world map of where iphone 15 users died :(
  - kridsdale32 days ago
    That could be approximated pretty well just combining income data and age data.
    bigyabai2 days ago
    Or by drawing a red circle around the United States labelled "~95%"
    astrangea day ago
    There's a country called China with a lot more people than the US.
pzo2 days ago
Truedepth from FaceID since iphone 13 got significantly worse - its very bumpy and noisy - we had to do significant denoising and filtering to make it useful again for 3d scanning
Lidar is a let down. First I would expect that Lidar would trickle down to non-pro devices. Come on apple FaceID got introduced in iphone X and next year it was in all iphone models. Lidar was introduced in iphone 12 pro and still only pro devices have it. As 3rd party dev it makes me reluctant to make any app using it if it limits my user base by 50%.
I'm also disappointed they didn't improve FaceID or Lidar in the last ~5 years (Truedepth still only 30fps, no camera format to mix 30fps depth + 120fps rgb, still big latency, Lidar still low resolution, no improvement to field of view)
yieldcrv2 days ago
Christ, that liquid cooled system is totally overkill for what he does. I'm so glad I don't bother with this stuff anymore, all to run his preferred operating system in virtualization because Windows uses his aging Nvidia card better
Chimera
The old gpu is an aberration and odd place to skimp. If he upgraded to a newer nvidia gpu it would have linux driver support and he could ditch windows entirely
And if he wasn’t married to arcgis he could just get a mac studio
- os2warpman2 days ago
  I've used, and this is not an exaggeration, practically every single open-source and commercially-licensed GIS application ever released on Linux, Windows, and macOS. Certainly every single one listed on Wikipedia.
  ArcGIS is the best general purpose solution.
  If doing mission planning including flight guidance for aerial operations, FalconView is the least worst.
  Telling an experienced GIS user to drop ArcGIS is like telling a somehow-still-living B.B. King to stop using Gibson guitars.
  And by "better supported" he probably meant "isn't a huge pain in the ass that breaks every other kernel upgrade due to dkms or the package manager shitting the bed" when it comes to Nvidia drivers. Doesn't matter if it's a 1080ti or a 5090.
- eurekin2 days ago
  I was generally flabbergasted by that "My Workstation" section existence in the article at all...
  Is it in any way relevant?
  - tarwina day ago
    "flabbergasted" ? That's quite a strong reaction. It's somewhat normal for nerdy mc-nerdfaces, which the writer definitely is (in all the good ways), to tell people about their hardware. Or at least it used to be? Seemed pretty geek-norm to me even if it was jarring.
    yieldcrv17 hours ago
    > Or at least it used to be?
    I chose not to write it earlier but my candid thoughts were that he seemed too old to still be doing this. Its just younger enthusiasts and professional gamers that do this, and the younger enthusiasts eventually get enough money for a mac and choose that.
    The PSU is completely overkill, as if he was going to get GPU's. But then he has this completely outdated and old, power inefficient GPU in it instead, which is nothing to brag about and doubly warrants an explanation if the rig is to be explained at all. All while newer GPU's from the same company solve all of his driver and OS problems.
    Flabbergasted is the right emotion.
  - naikrovek2 days ago
    I imagine it is there to stop people asking about it. But who knows.
curtisszmaniaa day ago
[dead]
wahnfrieden2 days ago
anyone combining these with photos for feeding to gpt4o to get more accurate outputs (like for calorie counting as a typical example)?
- cenamus2 days ago
  Calorie counting as never gonna be accurate, just how would you know what's hiding inside a stew or curry? How much oil or meat? How much dressing is on the salad? There's a reason people do caloriy counting with raw ingredients (or pre bought stuff) and not by wheighing and measuring plates of food
  - wahnfrieden2 days ago
    I know that (and I'm not building a calorie counter). The question is about whether 4o can read photos better with depth maps or derived measurements provided alongside the original image and the example was chosen as it's inaccurate but could perhaps be improved with depth map data (even if not to the point of "accurate")
    duskwuff2 days ago
    The answer to that question is "probably not".
    First: the image recognition model is unlikely to have seen very many depth maps. Seeing one alongside a photo probably won't help it recognize the image any better.
    Second: even if the model knew what to do with a depth map, there's no reason to suspect that it'd help in this application. The lack of accuracy in a image-to-calorie-count app doesn't come from problems which a depth map can answer like "is this plate sitting on the table or raised above it"; they come from problems which can't be answered visually like "is this a glass of whole milk or non-fat" or "are these vegetables glossy because they're damp or because they're covered in butter".
    wahnfrieden2 days ago
    Depth map can be used to measure dimensions of objects, not only relative positions
  - criddell2 days ago
    It's probably accurate enough for most people most of the time.
    The labels on your food are +/-20%. If you are analyzing all of your meals via camera, it's probably not too far off that over a week.
    cenamus2 days ago
    I find that hard to believe, those calorie measurements are pretty good.
    And 20% per week is the difference between maintaining your wheight and gaining 1.6kg of pure fat a month. (Or losing that much)
  - globular-toasta day ago
    The only reason one might care about an accurate calorie counter is for longevity. But the vast majority of people who count calories are doing it for weight control. They don't need an accurate calorie count.
    All you need to know is whether your weight is trending how you want it to trend and, if not, adjust calories appropriately. I feel like a clever enough model could take pictures of food, your heart rate and your daily weigh in and suggest ways to achieve your weight goal, like "have less fries" etc.
1oooqooq2 days ago
> *describes a top of the line system
> I'm running Ubuntu 24 LTS via Microsoft's Ubuntu for Windows on Windows 11 Pro
this is like hearing someone buying yet another automatic super car.
- mmmlinux2 days ago
  Yeah, section seemed like some weird brag.
  - washadjeffmad2 days ago
    It's common practice in the sciences to include details about any equipment used in your lab notes.
    BobbyTables22 days ago
    True, but a physicist wouldn’t normally document the shoes they were wearing.
    The details he documented don’t seem relevant for what he did and nothing performed would seem to stress even a low end ancient system.
    Definitely felt like a brag to me.
    Only thing that makes me think otherwise is he also documented the line counts of the scripts. Seems more like a bizarre obsession with minutiae… (would have been more meaningful to document the git commit/branch for a GitHub project instead of the line count!!)
    miladyincontrol2 days ago
    Agreed, if he really felt obligated to mention all he needed to say was desktop is a 9950X w 1080 ti and 98GB of ram. At most maybe adding that he used WSL on windows 11 but even that seems minimally relevant and just dressing.
    globular-toasta day ago
    He has a water-cooled rig. Dude probably likes computers. I've always got time for people who want to talk about things they like (of course it helps if I like them too). Go to Tik Tok if you want ultra-optimised chunks of junk entertainment.
  - throitallaway2 days ago
    He's running an Nvidia 1080 GPU, a non-XD AMD processor, and Windows 11. None of that is a brag.
    BobbyTables22 days ago
    My system with a non-accelerated iGPU and 16GB DDR4 RAM would differ…