Safe and Robust Sequential Decision-Making

Seminar Date(s)
Seminar Location
Atkinson Hall, Room 4004
Seminar Speaker
Mohammad Ghavamzadeh - Facebook and Inria -
Photo
Abstract

In many practical problems from online advertisement to healthcare and computational finance, it is extremely important to have guarantees on the performance and other characteristics of the policy generated by our algorithms. This reduces the risk of deploying our policy and helps us to convince the product (hospital, investment) managers that it is not going to harm their business. In the first part of the talk, we provide an overview of our work on learning safe and risk-sensitive policies in sequential decision-making problems. The notion of safety studied here is “safety w.r.t. a baseline”, i.e., a policy is considered safe if it is guaranteed to perform at least as well as a baseline. We look at the problem of safety w.r.t. a baseline from three different angles that are related to off-policy evaluation and counterfactual inference; robust control and the simulation to real problem; and conservative exploration in online learning. The second part of the talk is about controlling non-linear dynamical systems from high-dimensional observations (e.g., raw pixel images) that is robust to noise in the system dynamics. Our method is a principled way of combining variational auto-encoders with locally-optimal controllers. It uses a deep generative model from the family of variational auto-encoders that learns the predictive conditional density of the future observation given the current one, while introducing a low-dimensional embedding space for control. We introduce specific structure in the generative graphical model so that the dynamics in the embedding space is constrained to be locally linear. We also propose a principled variational approximation of the embedding posterior that is (more) robust against the noise. 

Seminar Speaker Bio
Mohammad Ghavamzadeh received a Ph.D. degree in Computer Science from the University of Massachusetts Amherst in 2005. From 2005 to 2008, he was a postdoctoral fellow at the University of Alberta. He has been a permanent researcher at INRIA in France since November 2008. He was promoted to first-class researcher in 2010, was the recipient of the "INRIA award for scientific excellence" in 2011, and obtained his Habilitation in 2014. Since 2013, he has been a senior researcher, first at Adobe Research (2013 to May 2017), then at DeepMind (June 2017 to October 2018), and now at Facebook AI Research (FAIR). He has been an area chair and a senior program committee member at NIPS, ICML, IJCAI, and AAAI. He has been on the editorial board of Machine Learning Journal (MLJ) and has been a reviewer for JMLR, MLJ, JAIR, Journal of operations research, IEEE TAC, and Automatica. He has published over 70 refereed papers in major machine learning, AI, and control journals and conferences, and has organized several tutorials and workshops at NIPS, ICML, and AAAI. His research is in the areas of machine learning, artificial intelligence, control, and learning theory; particularly to investigate the principles of scalable decision-making and to devise, analyze, and implement algorithms for sequential decision-making under uncertainty and reinforcement learning.
Seminar Contact
Tara Javidi
<tjavidi@ucsd.edu>