Quan Kong

Staff Research Scientist

Woven by Toyota, Inc.

Email: quan.kong [at] woven-planet (dot) global

Research Topics

Vision-Language Models
Video Understanding
Multi-Modal Perception
Self-Supervised Learning

I am a staff research scientist at Woven by Toyota, Inc. working on computer vision. My concentric is about Machine Learning and the usage of it on Computer Vision, Large Language Models and Multi-Modal Perception. Before working at Woven by Toyota, I was a senior researcher of Hitachi, Ltd. R&D Japan working on large scale surveillance video analysis system, and a visiting researcher of Department of DBI at ATR working on home automation. I finished my Ph.D. and M.S. at Osaka University, advised by Takuya Maekawa, Norihisa Komoda and Yasuyuki Matsushita, and my undergraduate degrees at Xi'an Jiao Tong University.

News

2025.02: Three papers accepted by CVPR 2025.
2025.01: One paper accepted by WACV 2025.
2024.10: Our proposal about "Multi-Modal Foundation Model for Urban Spatial-Temporal Understanding" accepted by NEDO GENIAC Project.
2024.09: WTS dataset will be presented at ECCV 2024.
2024.08: Two papers accepted by ECCV 2024.
2024.01: WTS: Woven Traffic Safety Dataset has been released and jointly held the competition with the 8th AI City Challenge@CVPR 2024!
2024.01: One paper accepted by WACV 2024.
2023.10: One paper accepted by ICCV 2023.
2023.06: One paper accepted by CVPR 2023.
2023.02: One paper accepted by AAAI 2023.
2022.07: My previous work about automatic baggage screening system in Hitachi was honored with Field Innovation Award from JSAI.
2022.03: Joined Woven Planet Holdings (now Woven by Toyota, Inc.) from Mar. 2022 working on Toyota Woven City located around Mt. Fuji.
2021.04: MMAct Challenge will be held on CVPR2021 in conjunction with ActivityNet workshop.
2020.12: Our team (VAS) achieved Top-1 result on TRECVID2020 DSDI task.
2020.12: One paper accepted by NeurIPS 2020.
2019.10: Large-scale multi-modal video action understanding dataset MMAct has been released for use!

More...

Projects

Multi-Modal Large Language Models for Industry Video Understanding & Agent Applications
Human-Centered Perception for City
Human Video Action Understanding for VCA & VSaaS
Large Scale Surveillance Video Analysis System
Patent Drawing Retrieval System
Image Classification/Segmentation for Automatic X-ray Baggage Screening System
Real World Context Recognition and Its Application for Supporting Interaction in Smart Environment (Ph.D. Thesis)

Professional Service

Chief research scientist for NEDO GENIAC Project
IEEE TIP, TPAMI, CVPR, ICCV, ECCV, WACV, AAAI, ICLR, ICMR, IJCAI PC member & Reviewer
IPSJ JIP (Journal of Information Processing) Editorial Board Member
IPSJ SIGUBI Committee Member

Quan Kong

News

Projects

Professional Service

Publications (show selected / show all )