대구한의대학교 향산도서관

상세정보

부가기능

Reinforcement Learning with High-Level Task Specifications

상세 프로파일

상세정보
자료유형학위논문
서명/저자사항Reinforcement Learning with High-Level Task Specifications.
개인저자Wen, Min.
단체저자명University of Pennsylvania. Electrical and Systems Engineering.
발행사항[S.l.]: University of Pennsylvania., 2019.
발행사항Ann Arbor: ProQuest Dissertations & Theses, 2019.
형태사항172 p.
기본자료 저록Dissertations Abstracts International 81-03B.
Dissertation Abstract International
ISBN9781088366899
학위논문주기Thesis (Ph.D.)--University of Pennsylvania, 2019.
일반주기 Source: Dissertations Abstracts International, Volume: 81-03, Section: B.
Advisor: Topcu, Ufuk
이용제한사항This item must not be sold to any third party vendors.
요약Reinforcement learning (RL) has been widely used, for example, in robotics, recommendation systems, and financial services. Existing RL algorithms typically optimize reward-based surrogates rather than the task performance itself. Therefore, they suffer from several shortcomings in providing guarantees for the task performance of the learned policies: An optimal policy for a surrogate objective may not have optimal task performance. A reward function that helps achieve satisfactory task performance in one environment may not transfer well to another environment. RL algorithms tackle nonlinear and nonconvex optimization problems and may, in general, not able to find globally optimal policies. The goal of this dissertation is to develop RL algorithms that explicitly account for formal high-level task specifications and equip the learned policies with provable guarantees for the satisfaction of these specifications. The resulting RL and inverse RL algorithms utilize multiple representations of task specifications, including conventional reward functions, expert demonstrations, temporal logic formulas, trajectory-based constraint functions as well as their combinations. These algorithms offer several promising capabilities. First, they automatically generate a memory transition system, which is critical for tasks that cannot be implemented by memoryless policies. Second, the formal specifications can act as reliable performance criteria for the learned policies despite the quality of the designed reward functions and variations in the underlying environments. Third, the algorithms enable online RL that never violates critical task and safety requirements, even during exploration.
일반주제명Artificial intelligence.
Computer science.
언어영어
바로가기URL : 이 자료의 원문은 한국교육학술정보원에서 제공합니다.

서평(리뷰)

  • 서평(리뷰)

태그

  • 태그

나의 태그

나의 태그 (0)

모든 이용자 태그

모든 이용자 태그 (0) 태그 목록형 보기 태그 구름형 보기
 
로그인폼