MARC보기
LDR00000nam u2200205 4500
001000000434380
00520200226150232
008200131s2019 ||||||||||||||||| ||eng d
020 ▼a 9781085567572
035 ▼a (MiAaPQ)AAI13811723
040 ▼a MiAaPQ ▼c MiAaPQ ▼d 247004
0820 ▼a 001
1001 ▼a Owens, Jason.
24510 ▼a Visual Perception for Robotic Spatial Understanding.
260 ▼a [S.l.]: ▼b University of Pennsylvania., ▼c 2019.
260 1 ▼a Ann Arbor: ▼b ProQuest Dissertations & Theses, ▼c 2019.
300 ▼a 237 p.
500 ▼a Source: Dissertations Abstracts International, Volume: 81-02, Section: B.
500 ▼a Advisor: Daniilidis, Kostas.
5021 ▼a Thesis (Ph.D.)--University of Pennsylvania, 2019.
506 ▼a This item must not be sold to any third party vendors.
520 ▼a Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don't have off-the-shelf libraries for this capability.Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don't yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently.We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet.
590 ▼a School code: 0175.
650 4 ▼a Computer science.
650 4 ▼a Robotics.
650 4 ▼a Artificial intelligence.
690 ▼a 0984
690 ▼a 0771
690 ▼a 0800
71020 ▼a University of Pennsylvania. ▼b Computer and Information Science.
7730 ▼t Dissertations Abstracts International ▼g 81-02B.
773 ▼t Dissertation Abstract International
790 ▼a 0175
791 ▼a Ph.D.
792 ▼a 2019
793 ▼a English
85640 ▼u http://www.riss.kr/pdu/ddodLink.do?id=T15490712 ▼n KERIS ▼z 이 자료의 원문은 한국교육학술정보원에서 제공합니다.
980 ▼a 202002 ▼f 2020
990 ▼a ***1816162
991 ▼a E-BOOK