대구한의대학교 향산도서관

상세정보

부가기능

Sample Complexity Bounds for the Linear Quadratic Regulator

상세 프로파일

상세정보
자료유형학위논문
서명/저자사항Sample Complexity Bounds for the Linear Quadratic Regulator.
개인저자Tu, Stephen L.
단체저자명University of California, Berkeley. Electrical Engineering & Computer Sciences.
발행사항[S.l.]: University of California, Berkeley., 2019.
발행사항Ann Arbor: ProQuest Dissertations & Theses, 2019.
형태사항148 p.
기본자료 저록Dissertations Abstracts International 81-04B.
Dissertation Abstract International
ISBN9781085793919
학위논문주기Thesis (D.Eng.)--University of California, Berkeley, 2019.
일반주기 Source: Dissertations Abstracts International, Volume: 81-04, Section: B.
Advisor: Recht, Benjamin.
이용제한사항This item must not be sold to any third party vendors.
요약Reinforcement learning (RL) has demonstrated impressive performance in various domains such as video games, Go, robotic locomotion, and manipulation tasks. As we turn towards RL to power autonomous systems in the physical world, a natural question to ask is, how do we ensure that the behavior observed in the laboratory reflects the behavior that occurs when systems are deployed in the real world? How much data do we need to collect in order to learn how to control a system with a high degree of confidence?This thesis takes a step towards answering these questions by establishing the Linear Quadratic Regulator (LQR) as a baseline for comparison of RL algorithms. LQR is a fundamental problem in optimal control theory for which the exact solution is efficiently computable with perfect knowledge of the underlying dynamics. This makes LQR well suited as a baseline for studying the sample complexity of RL algorithms which learn how to control from observing repeated interactions with the system.The first part of this thesis focuses on model-based algorithms which estimate a model of the underlying system, and then build a controller based on the estimated dynamics. We show that the classic certainty equivalence controller, which discards confidence intervals surrounding the estimated dynamics, is efficient in regimes of low uncertainty. For regimes of moderate uncertainty, we propose a new model-based algorithm based on robust optimization, and show that it is also sample efficient.The second part studies model-free algorithms which learn intermediate representations instead, or directly search for the parameters of the optimal controller. We first look at the classical least-squares policy iteration algorithm, and establish an upper bound on its sample complexity. We then use tools from asymptotic statistics to characterize the asymptotic behavior of both the certainty equivalence controller and the popular policy gradient method on a particular family of LQR instances, which allows us to directly compare the bounds. This comparison reveals that the model-free policy gradient method has polynomial in state/input dimension and horizon length worse sample complexity than the model-based certainty equivalence controller. Our experiments corroborate this finding and show that model-based algorithms are more sample efficient than model-free algorithms for LQR.
일반주제명Electrical engineering.
Computer science.
언어영어
바로가기URL : 이 자료의 원문은 한국교육학술정보원에서 제공합니다.

서평(리뷰)

  • 서평(리뷰)

태그

  • 태그

나의 태그

나의 태그 (0)

모든 이용자 태그

모든 이용자 태그 (0) 태그 목록형 보기 태그 구름형 보기
 
로그인폼