대구한의대학교 향산도서관

링크메뉴

주메뉴

- 소장자료+타기관자료검색
- 단행본+목차
- 연속간행물
- 학위논문
- 멀티미디어자료
- 기사색인
- 고문헌
- 전자책
- 신착/인기자료
- 오디오북
- 전자책
- 전자잡지(교양)
- 국내학술 DB
- 국외학술 DB
- E-learning
- Hot Article
- 전자저널 A-Z
- SFX(구독자원연계) 이용안내
- 전국대학 학위논문검색
- 논문 표절검사 솔루션
- 소개
- 개관시간
- 대출/반납
- 실별안내
- 편의시설
- 전자정보 교외접속안내
- 도서기증
- 이용교육자료
- 연구지원서비스
- 특별회원제도 안내
- LIFE 독서 마라톤
- DHU독서샘물
- 공지사항
- 자주묻는질문
- 묻고답하기
- 자료실
- 베스트 셀러
- 추천도서
- 대출현황/연장/예약
- 캠퍼스간대출신청현황
- 자료배달신청현황
- 희망도서신청
- 나의서평
- 타기관자료이용신청
- 그룹스터디룸
- 영상시설이용
- 관심도서리스트
- 소재불명도서 신고처리현황
- 교육 및 참가신청
- 개인정보관리

상세정보

상세정보

검색결과 돌아가기

검색화면

Export to Refworks

부가기능

MARC보기

Robust Video Object Tracking via Camera Self-Calibration

상세 프로파일

상세정보
자료유형	학위논문
서명/저자사항	Robust Video Object Tracking via Camera Self-Calibration.
개인저자	Tang, Zheng.
단체저자명	University of Washington. Electrical and Computer Engineering.
발행사항	[S.l.]: University of Washington., 2019.
발행사항	Ann Arbor: ProQuest Dissertations & Theses, 2019.
형태사항	135 p.
기본자료 저록	Dissertations Abstracts International 81-03B. Dissertation Abstract International
ISBN	9781088311066
학위논문주기	Thesis (Ph.D.)--University of Washington, 2019.
일반주기	Source: Dissertations Abstracts International, Volume: 81-03, Section: B. Advisor: Hwang, Jenq-Neng.
이용제한사항	This item must not be sold to any third party vendors.This item must not be added to any third party search indexes.
요약	In this dissertation, a framework for 3D scene reconstruction based on robust video object tracking assisted by camera self-calibration is proposed, which includes several algorithmic components. (1) An algorithm for joint camera self-calibration and automatic radial distortion correction based on tracking of walking persons is designed to convert multiple object tracking into 3D space. (2) An adaptive model that learns online a relatively long-term appearance change of each target is proposed for robust 3D tracking. (3) We also develop an iterative two-step evolutionary optimization scheme to estimate 3D pose of each human target, which can jointly compute the camera trajectory for a moving camera as well. (4) With 3D tracking results and human pose information from multiple views, we propose multi-view 3D scene reconstruction based on data association with visual and semantic attributes.Camera calibration and radial distortion correction are crucial prerequisites for 3D scene understanding. Many existing works rely on the Manhattan world assumption to estimate camera parameters automatically, however, they may perform poorly when lack of man-made structure in the scene. As walking humans are common objects in video analytics, they have also been used for camera calibration, but the main challenges include noise reduction for the estimation of vanishing points, the relaxation of assumptions on unknown camera parameters, and radial distortion correction. We propose a novel framework for camera self-calibration and automatic radial distortion correction. Our approach starts with a multi-kernel-based adaptive segmentation and tracking scheme that dynamically controls the decision thresholds of background subtraction and shadow removal around the adaptive kernel regions based on the preliminary tracking results. With the head/foot points collected from tracking and segmentation results, mean shift clustering and Laplace linear regression are introduced in the estimation of the vertical vanishing point and the horizon line, respectively. The estimation of distribution algorithm (EDA), an evolutionary optimization scheme, is then utilized to optimize the camera parameters and distortion coefficients, in which all the unknowns in camera projection can be fine-tuned simultaneously. Experiments on three public benchmarks and our own captured dataset demonstrate the robustness of the proposed method. The superiority of this algorithm is also verified by the capability of reliably converting 2D object tracking into 3D space.Multiple object tracking has been a challenging field, mainly due to noisy detection sets and identity switch caused by occlusion and similar appearance among nearby targets. Previous works rely on appearance models built on individual or several selected frames for the comparison of features, but they cannot encode long-term appearance change caused by pose, viewing angle and lighting condition. We propose an adaptive model that learns online a relatively long-term appearance change of each target. The proposed model is compatible with any features of fixed dimension or their combinations, whose learning rates are dynamically controlled by adaptive update and spatial weighting schemes. To handle occlusion and nearby objects sharing similar appearance, we also design cross-matching and re-identification schemes based on the proposed adaptive appearance models. Additionally, the 3D geometry information is effectively incorporated in our formulation for data association. The proposed method outperforms all the state-of-the-art on the MOTChallenge 3D benchmark and achieves real-time computation with only a standard desktop CPU. It has also shown superior performance over the state-of-the-art on the 2D benchmark of MOTChallenge.For more comprehensive 3D scene reconstruction, we develop a monocular 3D human pose estimation algorithm based on two-step EDA that can simultaneously estimate the camera motion for a moving camera. We first derive reliable 2D joint points through deep-learning-based 2D pose estimation and feature tracking. If the camera is moving, the initial camera poses can be estimated from visual odometry, where the feature points extracted on the human bodies are removed by segmentation masks dilated from 2D skeletons. Then the 3D joint points and camera parameters are iteratively optimized through a two-step evolutionary algorithm. The cost function for human pose optimization consists of loss terms defined by spatial and temporal constancy, "flatness" of human bodies, and joint angle constraints. On the other hand, the optimization for camera movement is based on the minimization of reprojection error of skeleton joint points. Extensive experiments have been conducted on various video data, which verify the robustness of the proposed method.The final goal of our work is to fully understand and reconstruct the 3D scene, i.e., to recover the trajectory and action of each object. The above methods can be extended to a system with camera array of overlapping views. We propose a novel video scene reconstruction framework to collaboratively track multiple human objects and estimate their 3D poses across multiple camera views. First, tracklets are extracted from each single view following the tracking-by-detection paradigm. We propose an effective integration of visual and semantic object attributes, including appearance models, geometry information and poses/actions, to associate tracklets across different views. Based on the optimum viewing perspectives derived from tracking, we generate the 3D skeleton of each object. The estimated body joint points are fed back to the tracking stage to enhance tracklet association. Experiments on a benchmark of multi-view tracking validate our effectiveness.
일반주제명	Artificial intelligence. Computer engineering. Electrical engineering.
언어	영어
바로가기	: 이 자료의 원문은 한국교육학술정보원에서 제공합니다.

정보 더보기(Naver)

서평(리뷰)

서평(리뷰)

링크메뉴

주메뉴

전체메뉴

상세정보

부가기능

상세 프로파일

서평(리뷰)

태그

나의 태그

모든 이용자 태그

MY MENU

도서관정보

서평(리뷰)
별점:
별점
제목:

내용: