Gaussian Splatting and Photogrammetry with 360/spherical imagery

Matthew Brennan

มุมมอง 15 900

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 24 พ.ย. 2024

ความคิดเห็น • 84

@ihspan6892 5 หลายเดือนก่อน ⁺¹
Immediately recognized Porto Venere! Absolutely unique place.
@djorkez ปีที่แล้ว ⁺⁶
Nice model of Portovenere 😉
@LaunchedPix ปีที่แล้ว ⁺²
Excellent video. I just stumbled onto your channel and see I have a lot to catch up on and learn (from you) in the 3D modeling and visualization space. Lots more to watch and then play with. Your reigniting my curious spirit to expand beyond photogrammetry into NeRF and Gaussian Splatting. Thanks for spending your time sharing this content and making and sharing all of your videos! 👏👏👏
@AndreasMake ปีที่แล้ว ⁺¹
Love this place, Porto vernere. Just south of La Spezia, Italy. Beautiful place. Nice 3d gs.
@mankit.mp4 ปีที่แล้ว ⁺³
Wonderful works Matthew thanks for showing us the possibilities and differences of photogrammetry and gaussian splatting. I'm a product designer working with a lot of craft making communities in stunning locations where I'm dabbling in such ways of documentations. Having studied yours and a few people's insta360 workflows I wonder whether the use of a full frame camera with ultra wide angle lens/fisheyes for 4K videos and using the extracts of stills for the software processing can provide a more optimal resolution and at the same time faster workflow?
@MatthewBrennan ปีที่แล้ว ⁺¹
Yes- a full frame camera with a good lens will work better! Coming from a photogrammetry workflow typically, I almost always use an A7Rii with a 12mm lens for architectural capture. I used stills instead of video, but I’ve been experimenting with both lately - it’s always a balance of speed (in terms of captures but also processing) and quality
@pixxelpusher ปีที่แล้ว ⁺²
@@MatthewBrennan Can you specify 180 degrees instead of 360 in the workflow? I have a Meike 6.5mm fisheye lens which I imagine would be the same as only using one hemisphere of the insta360, but higher resolution as it's using the full sensor of the Sony
@MatthewBrennan ปีที่แล้ว ⁺²
@@pixxelpusher It depends on what type of scene you're trying to capture: a fisheye lens would not work well for something like a sculpture, because the object of interest would only occupy a very small amount of "real estate" on the sensor (so to speak) - but it should work very well for rapidly capturing a large urban scene like a piazza, which you should be able to do in far fewer photos than with something like a 20mm-35mm "wide" lens.
@mankit.mp4 ปีที่แล้ว ⁺²
@@MatthewBrennan Oh wow a 12mm. Didn't expect you'd go for something that wide since there might be distortions, but I suppose it depends on how it's corrected in post and obviously the wider it is the less likely you'd miss something, which is super important. Yes the optimal quality vs speed workflow is something I'm constantly trying to reach! Will stay tune for more of your conents keep it up!
@bluestarbursts ปีที่แล้ว ⁺²
Amazing stuff! I'm just getting into Gaussian Splatting your videos are really insightful about just how capable it can be. I want to work on a project using splatting but the constraints I'm working with are very limiting. What do you think the absolute minimum amount of low res photos in a 8m by 8m space that would be able to reproduce a scene?
@MatthewBrennan ปีที่แล้ว ⁺¹
It depends how many occlusions there are (for example a very cluttered space with many objects) vs. if it’s an empty space (ie a gallery with flat art on the walls).
@identiticrisis ปีที่แล้ว ⁺²
Is there a way to use a crude surface extraction technique to exclude those errant splats? I know they contribute to the detail in the reconstructed images, but they cause serious issues everywhere else. I imagine that removing them will be much more of a benefit given Gaussian splatting is intended to produce interactive spaces.
Something like a low poly model as a "bounding box" to tidy up the point cloud? Clearly the point cloud itself is at fault, but I wouldn't know how to improve the outcome in this case.
It seems like a combination of these techniques you've been showcasing would be very powerful indeed.
@MatthewBrennan ปีที่แล้ว ⁺³
the Unity project I've been using (by Aras-P, available here: github.com/aras-p/UnityGaussianSplatting ) was just updated with an editing tool to delete the "floating" splats...
@natelawrence ปีที่แล้ว ⁺¹
@@MatthewBrennanThanks for the heads up.
I've been waiting for someone to implement cropping for these scenes.
@groupb-live 3 หลายเดือนก่อน
Hi Matthew. Great video. I wonder how the overview of the village is made. I assume it's a different capture, right? I mean, it's not created from the same walking around capture with some Nerf calculation or something, right? Thanks,
@MatthewBrennan 3 หลายเดือนก่อน ⁺¹
Thank you! The capture of Porto Venere is a combination of handheld mirrorless camera capture and UAV imagery.
@dreadthedrums 10 หลายเดือนก่อน
Wow - amazing work and thanks for the detailed run through your workflow. Looks like you have spent a fair bit of time with both metashape and colmap, have you worked out if it is possible to georeference a splat? I imagine this would ensure the rotation, scale are preserved. I use a mavic with RTK for photogrammetry here in Australia, and can get a regular point cloud to within 20mm accuracy with good ground control points, if you could do that with a splat l, it would be an absolute game changer
@MatthewBrennan 10 หลายเดือนก่อน
I don't see why you couldn't georef a splatted cloud - especially if you've got GCP with scale bars. The applications currently are primarily visual (i.e. generating video)
@dreadthedrums 10 หลายเดือนก่อน
@@MatthewBrennan thanks for the response. Any idea on a workflow that might work, as Colmap doesn't support georeferencing to coordinate systems if I understand correctly
@MatthewBrennan 10 หลายเดือนก่อน ⁺¹
Metashape allows georeferencing, however all of that (including cloud/model orientation and transform) seems to be stripped during the Gaussian splat training process. It shouldn't be hard to integrate, but it'd require coding skills beyond what I possess! :)
@dreadthedrums 10 หลายเดือนก่อน
@@MatthewBrennan Interesting. I'll have a look at Metashape to Gaussian workflow for now. Assume there is a way to export the poses and sparse cloud in a format that is acceptable to be trained. Interesting that the gaussian strip's the info when every individual splat has a location relative to some coordinate system.
@MatthewBrennan 10 หลายเดือนก่อน
@@dreadthedrums here's the Agisoft export script (exports in "COLMAP" format that train.py expects): github.com/agisoft-llc/metashape-scripts/blob/master/src/export_for_gaussian_splatting.py
@shark3D ปีที่แล้ว ⁺¹
all I know is this looks like the location I had to model for fast/furious 8 (iceland dock) but probably just a similarity
@bolloxim1 ปีที่แล้ว
Very awesome, I've been looking at large area renderers. Question is there any issues capturing large areas with drones that might have shadows 'move' over the course of time. Any thoughts on capturing motion objects ? I've been reading on 4d Gaussian splatting using time as well. Could you capture for example a savannah like table top mountain for example but also capture the motion ?
@MatthewBrennan ปีที่แล้ว ⁺¹
Shadows aren't too much of an issue as long as it's not drastic (i.e. trying to combine photos from 10am with ones from 5pm). Moving objects won't work from the perspective of photogrammetry, but it's not a problem if there are moving objects (i.e. cars, people, etc) in a broader scene. The so-called "4D" gaussian splatting is a bit misleading, because those captures were done with multi-camera rigs (10-20 cameras at least) in a controlled environment (such as a studio).
I've combined photo sets of buildings taken years apart - as long as the key features don't change, it's quite possible to combine photos taken at different times of day and in different seasons. You may have to use manual control points to "force" some alignment.
It's possible to capture some apparent "motion" in NeRF or GS scenes (like cars or people moving, or reflections changing) - but this is all based on the input imagery.
@吕康杰 11 หลายเดือนก่อน ⁺¹
Great video. Thank you.
I tried NerfStudio to convert 360 pictures, but there were always black line in 0,4,5,6,7 picture when 8 images per equirectangular image is used.
@MatthewBrennan 11 หลายเดือนก่อน
Strange, I’ve never seen that - I’ll try it again and see if I can reproduce it.
@panonesia 9 หลายเดือนก่อน
@@MatthewBrennan can you share how you make planar projection from equirectangular image ? I tried NerfStudio but the result not good, colmap only found 4% poses, sad
@MatthewBrennan 9 หลายเดือนก่อน
@@panonesia try extracting 14 frames instead of 8, for more overlap. I haven't used anything other than NerfStudio, so I can't make any suggestion there unfortunately - of course it could also be an issue with COLMAP settings - try changing features detected, etc... I stopped using colmap and only use Agisoft Metashape for alignment now.
@panonesia 9 หลายเดือนก่อน
@@MatthewBrennan ah... so metashape have good feature for planar image? any special setting for aligment? generic preselection using source, estimated or sequeantial?
@MatthewBrennan 9 หลายเดือนก่อน
@@panonesia metashape is an industry-standard photogrammetry software. If you're working with video frames you can do sequential, otherwise I leave it on source (which will use GPS if you have drone or gps exif)
@marinomazor.adventures ปีที่แล้ว
Nice work 🤟
@HeadPack 7 หลายเดือนก่อน
Very informative video. You are showing a textured model from photogrammetry in the end. How does one create that?
@MatthewBrennan 7 หลายเดือนก่อน ⁺¹
You need a photogrammetry software. In this case, I used Agisoft Metashape, but there are free/open-source options, such as COLMAP and VisualSFM
@HeadPack 7 หลายเดือนก่อน
@@MatthewBrennan Thank you very much for that information. Much appreciated.
@topvirtualtourscom ปีที่แล้ว ⁺¹
Great video, it looks like 360 video for gaussian splatting is almost unusable because of such low quality result. Could I use full frame Sony a7iii camera and fisheye lens 7.5mm for videos and photos for gaussian splatting or it should be 12mm lens, what is the widest lens possible to use for good result, both videos and photos?
@MatthewBrennan ปีที่แล้ว
I think it really depends on the resolution of the video - in this case I was using an Insta360 one 5.7k camera, which produces pretty grainy video - the new 1" sensor insta360 appears to take much better quality video. A full frame camera with wide angle lens, either shooting video or stills, will definitely produce better results (as you can see from the statue scan at the end of this video)!
@topvirtualtourscom ปีที่แล้ว ⁺¹
@@MatthewBrennan Thanks for a quick answer. I am actually using 8k Insta 360 pro 2 but still dont think it will be good for gaussian splatting. You think I could use fisheye lens 7.5mm or its better to use 12mm lens? And I think the village in video is Portovenere.
@MatthewBrennan ปีที่แล้ว
@topvirtualtourscom3619 fisheye lenses typically aren't great for photogrammetry because of the amount of distortion - and as with spherical imagery, you're putting a lot of data onto the sensor. I have some fisheye and wide-angle datasets that I'm planning to process in the next week and I'll post the results and a comparison.
My opinion is that the 12mm lens would work better than the 7.5mm, while still capturing a very wide field of view.
Also- you're right! It's Porto Venere! Good eye. How can I contact you?
@RiccaDiego 8 หลายเดือนก่อน
Hi! Amazing information! I think you can help me with some information.
I have point clouds from a BLK360 scanner from leica. Do you know if it is possible to turn these point clouds into Gaussian Splatting?
Thanks a lot!
@MatthewBrennan 8 หลายเดือนก่อน
no, probably not, because the Gaussian splatting is based on image data, not LiDAR data. You could use a photogrammetry program to align your LiDAR datasets to imagery, though.
@vassilisseferidis ปีที่แล้ว
Great video Matthew. Thank you.
I am following the same workflow with an Insta360 Pro which supports a higher (8k) resolution . The result is good only if you follow the same path with the original recording camera but if fails if you try to wander off. Is there a way to improve the quality in your opinion?
@MatthewBrennan ปีที่แล้ว ⁺¹
Unfortunately I think the only way (at the moment - this technology is still new and no doubt will advance quickly) is to use higher resolution, low-distortion images. For example, an 8k 360 is still only giving you 8x1k images once you split them apart using NeRFstudio, whereas a frame camera will give you a 7000px x 4000px image, you'll just have to take more photos.
The gaussian splatting method fails if you move from the camera path because those are the only locations that splats have been "trained" from - in other words, when viewed from a different angle, there's technically no data about what color the splat should be (because there was no photo of it).
@MatthewBrennan ปีที่แล้ว ⁺¹
I think another big impact is the number of points in the initial COLMAP alignment - look up some strategies for increasing points in the sparse cloud - since those are what are used for the splatting. Less points = less splats (for example, the "Bike" scene from the GS paper had 6 million splats!).
@foolishonboards 8 หลายเดือนก่อน
that's what I'm also wondering after watching this video and reading the comments. wouldn't there be any way's to generate more points from those 8x1k images ? @@MatthewBrennan
@MatthewBrennan 8 หลายเดือนก่อน
@@foolishonboards Yes - I've found using Agisoft metashape and simply upping the tie/key point limit works!
@foolishonboards 7 หลายเดือนก่อน
@@MatthewBrennan thanks for the info. did you tried to use an AI tool to bump up the resolution of these 8k x 1k images before feeding them to photogrammetry process ?
@blackhatultra ปีที่แล้ว
is it possible to use it in a regular 3d package? light, work on model and redener. As for now I don't see any mesh so how this technique is usable?
@MatthewBrennan ปีที่แล้ว ⁺¹
Right now you can edit the splats, but the point cloud cannot be re-lit, the lighting is baked in. As for its use: for the moment I think it’s a solution in search of a problem. I can see this being immediately useful in virtual production though.
@blackhatultra ปีที่แล้ว
is it possible to apply a z-defocus?
@@MatthewBrennan
@samueljames1511 ปีที่แล้ว
With a Insta 360 pro, Would it be better to use the images of the 6 fisheye lenses, Undistorting them and then putting them into Colmap or would it be better stitching them, using nerf studio then using Colmap
@MatthewBrennan ปีที่แล้ว ⁺¹
You will have to use some method of splitting the equirectangular (360) images into a series of "flat" frames that can be understood by the GS scripts (the GS method on github likes COLMAP format/structure, unfortunately). The fastest/easiest route to this that I've found is to use NerfStudio to automatically split the 360 images into either 8 or 14 (depending on the amount of overlap you want) frames. Then you can align these in COLMAP or in metashape (and export in colmap format).
@Zanaga2 9 หลายเดือนก่อน
Is it possible combining multiple cameras to get a better quality and keep the 360 capturing workflow speed?
@MatthewBrennan 9 หลายเดือนก่อน ⁺¹
If you had multiple 360 cameras you could mount them vertically on a pole and move that through the scene, yes (getting numerous heights would be key).
Another solution would be multiple mirrorless cameras mounted at 3 heights, so something like 12-15 cameras total, firing simultaneously as you moved through an environment.
@Zanaga2 9 หลายเดือนก่อน
@MatthewBrennan oh, I forgot to specify, but I was thinking about regular cameras (non-360). My bad.
But thanks for answering, I might try it one day for a long shot with no cuts.
@smcclure3545 ปีที่แล้ว ⁺¹
how well would this work with 360 video walking through a building with multiple rooms and hallways?
@smcclure3545 ปีที่แล้ว
particularly if there's already an underlying point cloud generated by a previous lidar scan?
@MatthewBrennan ปีที่แล้ว ⁺¹
In this case I would use the 360 imagery to texture the lidar scan. I'm not sure Gaussian Splatting would add any value there.
@smcclure3545 ปีที่แล้ว
@MatthewBrennan Thanks for the reply. underlying the question is the issue of reducing the time it takes to both process repeat scans and produce updated imagery and/or geometry for building maintenance and operations. 3d surface models from reality capture are more accurate, but seem more process intense and less relatable yo an average user.
I wonder, would you be willing to have a meeting with me? I'm conducting research as an innovator in this space for my job and it would be helpful to "project a trendline" for where the tech is headed.
@MatthewBrennan ปีที่แล้ว ⁺¹
@@smcclure3545 sure- feel free to send me an email.
@smcclure3545 ปีที่แล้ว
@@MatthewBrennan done, thanks 😊
@PierreJeanLievaux ปีที่แล้ว ⁺¹
I know it is Porto Venere La Spezia
@natelawrence ปีที่แล้ว
How many training iterations are you using when generating your 3D Gaussian Splatting scenes?
Also, how much VRAM does your GPU you're calculating them with have?
@MatthewBrennan ปีที่แล้ว ⁺¹
30,000 iterations, A100 with 40GB
@natelawrence ปีที่แล้ว ⁺²
@@MatthewBrennan Hmm. Thanks for replying. That is definitely a bit disheartening.
The cloudy results here can't be blamed on lack of computing resources or (as a result) not enough training cycles.
I wonder to what extent using consecutive neighbor matching for video frames during feature matching and scene reconstruction is to blame.
I fully understand how much how much time that would save during image comparisons. I guess it's not as clear to me to what extent non-consecutive input images are/aren't compared after the camera poses are estimated via matching consecutive images.
In other words, after each image is compared with the one directly following it (and therefore has been compared to both the one that precedes it and follows it) and the point cloud and camera poses are calculated, does the typical Structure from Motion bundle adjuster then look at the camera poses and say, 'Based on how I've reconstructed the scene so far, these frames are looking at the same area of the point cloud, so even though they don't directly neighbor each other temporally, I'm going to compare their features to each other to increase the accuracy of the reconstruction.'?
@percheronyt ปีที่แล้ว
jessuusss christ@@MatthewBrennan
@crestz1 ปีที่แล้ว
Hi what’s the application you’re using to view the Gaussian splats?
@MatthewBrennan ปีที่แล้ว
These are all visualized/rendered in Unity
@gaussiansplatsss 7 หลายเดือนก่อน
What is your drone?
@MatthewBrennan 7 หลายเดือนก่อน
Mavic 2
@infectioussneeze9099 7 หลายเดือนก่อน
what are your computer spec?
@MatthewBrennan 7 หลายเดือนก่อน
The 3DGS was trained on a cloud workstation instance with an A100 GPU. The video rendering and photogrammetry was done on a desktop with a Ryzen 9 3900X, 4070ti, and 32GB RAM.
@rafalfaro ปีที่แล้ว ⁺¹
This is cool but the low framerate made me sick at the beginning.
@nekosan01 9 หลายเดือนก่อน
how it accurate compare to epic realitycapture? You talking like old photogrammetry not exist and this is something new and something good, but how it's good? It's requires highend videocard! and result is not better than old photogrammetry what run on old PC just fine.
@MatthewBrennan 8 หลายเดือนก่อน
Not sure I understand what you're asking. NeRF/3DGS are completely different from photogrammetry, the only similarity is that they use the initial camera pose estimation. NeRF/3DGS at the moment (afaik) don't have a quantifiable accuracy and shouldn't be used for anything beyond visualization.
@RikMaxSpeed ปีที่แล้ว
The town is Porto Venere, Italy. Beautiful! 🤩 en.wikipedia.org/wiki/Porto_Venere

ต่อไป

เล่นอัตโนมัติ

Photogrammetry / NeRF / Gaussian Splatting comparison