- 88
- 32 821
Humphrey Shi
United States
เข้าร่วมเมื่อ 8 มิ.ย. 2011
Mostly research and educational contents (talks, papers, presentations) from my lab or guests, in computer vision, machine learning, ai systems & applications etc.
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation
github.com/SHI-Labs/OLA-VLM
OLA-VLM introduces a new approach to distilling vision knowledge into the hidden representations of LLMs, utilizing target visual representations to advance visual perception in multimodal LLMs.
OLA-VLM introduces a new approach to distilling vision knowledge into the hidden representations of LLMs, utilizing target visual representations to advance visual perception in multimodal LLMs.
มุมมอง: 66
วีดีโอ
StreamingSVD - A StreamingT2V Method for Long Image to Video Generation
มุมมอง 1.1K4 หลายเดือนก่อน
github.com/Picsart-AI-Research/StreamingT2V
StreamingT2V: High-Resolution Long Image-to-Viedeo Generation
มุมมอง 1206 หลายเดือนก่อน
github.com/Picsart-AI-Research/StreamingT2V
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
มุมมอง 2.4K9 หลายเดือนก่อน
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text Code: github.com/Picsart-AI-Research/StreamingT2V Paper: arxiv.org/abs/2403.14773 Project: streamingt2v.github.io
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
มุมมอง 647ปีที่แล้ว
VCoder enhances object-level perception skills in Multimodal LLMs, using perception modalities as auxiliary control inputs. We demonstrate the efficacy of using segmentation maps and depth maps as control inputs to improve MLLMs at counting and ordering objects. Paper & Code: github.com/SHI-Labs/VCoder
Internet Vision: Impact of Internet on Computer Vision, by Tom Huang, 2008
มุมมอง 74ปีที่แล้ว
slides of an informal talk by Tom Huang in 2008 talk title: Internet Vision: Impact of Internet on Computer Vision
Demo for the Matting Anything Model
มุมมอง 2.4Kปีที่แล้ว
Matting Anything The Matting Anything Model (MAM) is an efficient and versatile framework for estimating the alpha matte of any instance in an image with flexible and interactive visual or linguistic user prompt guidance. Paper & Code: github.com/SHI-Labs/Matting-Anything
Invited Talk: From Pixels to Regions: Towards Universal Image Segmentation
มุมมอง 8012 ปีที่แล้ว
Invited Talk: From Pixels to Regions: Towards Universal Image Segmentation
Oral Paper: Using Pure Pollen Species When Training a CNN to Segment Pollen Mixtures.
มุมมอง 1282 ปีที่แล้ว
Oral Paper: Using Pure Pollen Species When Training a CNN to Segment Pollen Mixtures.
Oral Paper: Pseudo-label Generation for Agricultural Robotics Applications
มุมมอง 1832 ปีที่แล้ว
Oral Paper: Pseudo-label Generation for Agricultural Robotics Applications
Oral Paper: High-Resolution UAV Image Generation for Sorghum Panicle Detection
มุมมอง 1752 ปีที่แล้ว
Oral Paper: High-Resolution UAV Image Generation for Sorghum Panicle Detection
Oral Paper: 3D Point Cloud Instance Segmentation of Lettuce Based on PartNet
มุมมอง 2832 ปีที่แล้ว
Oral Paper: 3D Point Cloud Instance Segmentation of Lettuce Based on PartNet
Oral Paper: Unsupervised Domain Adaptation & SR on Drone Img for Auto Dry Herbage Biomass Estimation
มุมมอง 2652 ปีที่แล้ว
Oral Paper: Unsupervised Domain Adaptation & SR on Drone Img for Auto Dry Herbage Biomass Estimation
Invited Talk: Deep Learning Weed Detection under an Integrated Weed Management Context
มุมมอง 4242 ปีที่แล้ว
Invited Talk: Deep Learning Weed Detection under an Integrated Weed Management Context
Invited Talk: Land Use Land Cover Classification in the Amhara Region, Northwest Ethiopia Using CNNs
มุมมอง 5412 ปีที่แล้ว
Invited Talk: Land Use Land Cover Classification in the Amhara Region, Northwest Ethiopia Using CNNs
Invited Talk: Intelligent Crop Management vis Deep Reinforcement Learning and Crop Simulations
มุมมอง 3882 ปีที่แล้ว
Invited Talk: Intelligent Crop Management vis Deep Reinforcement Learning and Crop Simulations
Agriculture-Vision Prize Challenge 2022: CropHarvest Track Winning Solution
มุมมอง 2312 ปีที่แล้ว
Agriculture-Vision Prize Challenge 2022: CropHarvest Track Winning Solution
Agriculture-Vision Prize Challenge 2022: Agriculture-Vision Track Winning Solution
มุมมอง 4032 ปีที่แล้ว
Agriculture-Vision Prize Challenge 2022: Agriculture-Vision Track Winning Solution
Invited Talk 1: Transcending Space Through Immersive Telecommunications (Zhengyou Zhang @Tencent)
มุมมอง 5663 ปีที่แล้ว
Invited Talk 1: Transcending Space Through Immersive Telecommunications (Zhengyou Zhang @Tencent)
FVC Human-Centric Video Matting Challenge: Team ZTE
มุมมอง 1523 ปีที่แล้ว
FVC Human-Centric Video Matting Challenge: Team ZTE
FVC Human-Centric Video Coding Challenge: ByteDance Team
มุมมอง 1903 ปีที่แล้ว
FVC Human-Centric Video Coding Challenge: ByteDance Team
FVC Human-Centric Video Coding Challenge: Team DWH-PKU
มุมมอง 1353 ปีที่แล้ว
FVC Human-Centric Video Coding Challenge: Team DWH-PKU
Invited Talk 5: Image Captioning with Knowledge and Style (Lexing Xie @ ANU)
มุมมอง 1583 ปีที่แล้ว
Invited Talk 5: Image Captioning with Knowledge and Style (Lexing Xie @ ANU)
Invited Talk 4: Cross-Platform ML for Video Conf with MediaPipe (Chuo-Ling Chang &Tingbo Hou@Google)
มุมมอง 3383 ปีที่แล้ว
Invited Talk 4: Cross-Platform ML for Video Conf with MediaPipe (Chuo-Ling Chang &Tingbo Hou@Google)
FVC Human-Centric Video Matting Challenge: Team Alibaba-Vision
มุมมอง 4683 ปีที่แล้ว
FVC Human-Centric Video Matting Challenge: Team Alibaba-Vision
Invited Talk 7: Video Object Segmentation for Video Conferencing (Sergi Caelles @ Google Research)
มุมมอง 1913 ปีที่แล้ว
Invited Talk 7: Video Object Segmentation for Video Conferencing (Sergi Caelles @ Google Research)
Invited Talk 6: Attention in AI Tasks (Catherine Zhao @ UMN)
มุมมอง 2503 ปีที่แล้ว
Invited Talk 6: Attention in AI Tasks (Catherine Zhao @ UMN)
Invited Talk 2: Future of Communication (Ira Kemelmacher-Shlizerman @ University of Washington)
มุมมอง 3963 ปีที่แล้ว
Invited Talk 2: Future of Communication (Ira Kemelmacher-Shlizerman @ University of Washington)
Invited Talk 3: Face-VID2VID: Neural Talking Head Synthesis For Video Conf (Ming-Yu Liu @ Nvidia)
มุมมอง 2.2K3 ปีที่แล้ว
Invited Talk 3: Face-VID2VID: Neural Talking Head Synthesis For Video Conf (Ming-Yu Liu @ Nvidia)
Oh my God this is awesome! Can't wait to watch it evolve
Congratulations on your research! Amazing work.
Looks so cool! I wish I could use it but i think my 11GB Vram won't be enough haha ❤
Phenomenal, thank you!
Phenomenal
provide code
Install guide?
Impressive!
2 minutes video is not reasonable sometimes
时刻关注 但我不懂得如何部署
Can I use this with RTX4060ti?😊
Can it do image to video and video to video tasks ?
great work. I have installed your script in Auto, but cannot find where to activate it ? Am I missing a step ?
Hi, could you elaborate on the issue that you met, maybe start an issue under our repo and I'll take a further look on it
Please how can I get the software?
☀️ [̲̅p][̲̅r][̲̅o][̲̅m][̲̅o][̲̅s][̲̅m]
Is the code implementation available ?
Superb ...
))99
Great Channel
could you pls explain how to put all labels together (at 1:30) in details and hoe to avoid overlap,this part i dont understand , if allowed ,can i have your multi-class ground truth?
Paper in openaccess.thecvf.com/content_CVPRW_2020/papers/w5/Bollis_Weakly_Supervised_Learning_Guided_by_Activation_Mapping_Applied_to_a_CVPRW_2020_paper.pdf
nice video bro