About the Book:
This book by
Prof. Masashi Sugiyama covers the range of reinforcement learning algorithms
from a fresh, modern perspective. With a focus on the statistical properties of
estimating parameters for reinforcement learning, the book relates a number of
different approaches across the gamut of learning scenarios…. It is a
contemporary and welcome addition to the rapidly growing machine learning
literature. Both beginner students and experienced researchers will find it to
be an important source for understanding the latest reinforcement learning
techniques. – Daniel D. Lee, GRASP Laboratory, School of Engineering and
Applied Science, University of Pennsylvania.
Reinforcement
learning is a mathematical framework for developing computer agents that can
learn an optimal behavior by relating generic reward signals with its past
actions. With numerous successful applications in business intelligence, plant
control, and gaming, the RL framework is ideal for decision making in unknown
environments with large amounts of data.
Supplying an
up-to-date and accessible introduction to the field, Statistical
Reinforcement Learning: Modern Machine Learning Approaches introduces
fundamental concepts and practical algorithms of statistical reinforcement
learning from the modern machine learning viewpoint. It covers various types of
RL approaches, including model-based and model-free approaches, policy
iteration, and policy search methods.
The book covers
approaches recently introduced in the data mining and machine learning fields to
provide a systematic bridge between RL and data mining/machine learning
researchers. It presents state-of-the-art results, including dimensionality
reduction in RL and risk-sensitive RL. Numerous illustrative examples are
included to help readers understand the intuition and usefulness of
reinforcement learning techniques. |
Contents:
I. Introduction
1. Introduction to Reinforcement Learning
II. Model-Free Policy Iteration
2. Policy Iteration with Value Function
Approximation
3. Basis Design for Value Function Approximation
4. Sample Reuse in Policy Iteration
5. Active Learning in Policy Iteration
6. Robust Policy Iteration
III. Model-Free Policy Search
7. Direct Policy Search by Gradient Ascent
8. Direct Policy Search by
Expectation-Maximization
9. Policy-Prior Search
IV. Model-Based Reinforcement Learning
10. Transition Model Estimation
Dimensionality
Reduction for Transition Model Estimation |
About the Author:
Masashi Sugiyama was born in Osaka, Japan, in 1974. He received
Bachelor, Master, and Doctor of Engineering Degrees in Computer Science from All
Tokyo Institute of Technology, Japan in 1997, 1999, and 2001, respectively. In
2001, he was appointed Assistant Professor in the same institute, and he was
promoted to associate professor in 2003. He moved to the University of Tokyo as
professor in 2014.
He received an Alexander von Humboldt
Foundation Research Fellowship and researched at Fraunhofer Institute, Berlin,
Germany, from 2003 to 2004. In 2006, he received a European Commission Program
Erasmus Mundus Scholarship and researched at the University of Edinburgh,
Scotland. He received the Faculty Award from IBM in 2007 for his contribution
to machine learning under non-stationarity, the Nagao Special Researcher Award
from the Information Processing Society of Japan in 2011, and the Young
Scientists’ Prize from the Commendation for Science and Technology by the
Minister of Education, Culture, Sports, Science and Technology for his
contribution to the density-ratio paradigm of machine learning.
His research interests include theories and
algorithms of machine learning and data mining, and a wide range of
applications such as signal processing, image processing, and robot control. He
published Density
Ratio Estimation in Machine Learning (Cambridge University
Press, 2012) and Machine Learning in Non-Stationary Environments: Introduction to
Covariate Shift Adaptation (MIT Press, 2012).
The author thanks his collaborators, Hirotaka
Hachiya, Sethu Vijayakumar, Jan Peters, Jun Morimoto, Zhao Tingting, Ning Xie,
Voot Tangkaratt, Tetsuro Morimura, and Norikazu Sugimoto, for exciting and
creative discussions. He acknowledges support from MEXT KAKENHI 17700142,
18300057, 20680007, 23120004, 23300069, 25700022, and 26280054, the Okawa Foundation,
EU Erasmus Mundus Fellowship, AOARD, SCAT, the JST PRESTO program, and the
Sirst program. |