The 3rd Asian Workshop on Reinforcement Learning (AWRL'18)
Nov. 14, Beijing Jiaotong University, Beijing, China
Meeting Room #5, B1, International Conference Center of BJTU

Invited Speakers | Program | Organization | Sponsors | About AWRL

(Registration: Attendees should register to ACML. 参会注册:参会代表请在ACML注册)

The Asian Workshop on Reinforcement Learning (AWRL) focuses on both theoretical foundations, models, algorithms, and practical applications. We intend to make this an exciting event for researchers and practitioners in RL worldwide as a forum for the discussion of open problems, future research directions and application domains of RL.

AWRL 2018 (in conjunction with ACML 2018) will consist of keynote talks, invited paper presentations, and discussion sessions spread over a one-day period.

Invited Speakers (sorted in alphabetical order)

Lucian Busoniu, Technical University of Cluj-Napoca, Romania
Lucian Busoniu received his Ph.D. degree cum laude from the Delft University of Technology, the Netherlands, in 2009. He is a professor with the Department of Automation at the Technical University of Cluj-Napoca, where he leads the group on Robotics and Nonlinear Control. He has previously held research positions in the Netherlands and in France. His research interests include nonlinear optimal control using artificial intelligence techniques, reinforcement learning and approximate dynamic programming, robotics, and multiagent systems. He has more than 70 research publications include among others several influential review articles and a book on reinforcement learning.
Shane Shixiang Gu, Google Brain
Shane Gu is a Research Scientist at Google Brain, where he mainly works on problems in deep learning, reinforcement learning, robotics, and probabilistic machine learning. His recent research focuses on sample-efficient RL methods that could scale to solve difficult continuous control problems in the real-world, which have been covered by Google Research Blogpost and MIT Technology Review. He completed his PhD in Machine Learning at the University of Cambridge and the Max Planck Institute for Intelligent Systems in Tübingen, where he was co-supervised by Richard E. Turner, Zoubin Ghahramani, and Bernhard Schölkopf. During his PhD, he also collaborated closely with Sergey Levine at UC Berkeley/Google Brain and Timothy Lillicrap at DeepMind. He holds a B.ASc. in Engineering Science from the University of Toronto, where he did his thesis with Geoffrey Hinton in distributed training of neural networks using evolutionary algorithms.
Minlie Huang, Tsinghua University, China
Dr. Minlie Huang now is an associate professor, deputy director of the AI Lab. of Dept. of Computer Science and Technology, Tsinghua University. He was selected by “Beijing Century Young Elite Program” in 2013, and won Hanvon Youngth Innovation Award in 2018. He won IJCAI-ECAI 2018 distinguished paper, NLPCC 2015 best paper, and CCL 2018 best demo award. He has two papers voted as Top 15 NLP papers in 2016 and Top 10 NLP papers in 2017 by PaperWeekly respectively. His work on Emotional Chatting Machine was reported by MIT Technology Review, the Guardian, NVIDIA, Cankao Xiaoxi, Xinhua News Agency, etc. He has published 60+ papers on top conferences such as ACL, AAAI, IJCAI, EMNLP, KDD, and highly-impacted journals like TOIS, Bioinformatics, JAMIA etc. He served as area chairs for ACL 2016, EMNLP 2014, EMNLP 2011, and IJCNLP 2017, and Senior PC of IJCAI 2017/IJCAI 2018/AAAI 2019, and reviewers for ACL, IJCAI, AAAI, EACL, COLING, EMNLP, NAACL, and reviewers for journals such as TOIS, TKDE, TPAMI, etc. As principle investigator, he had established collaborations with industrial companies such as Samsung, Microsoft, HP, Google, Sogou, Tencent, Alibaba, etc. His homepage is at:
Gergely Neu, Pompeu Fabra University, Spain
Gergely Neu is a research assistant professor at the Pompeu Fabra University, Barcelona, Spain. He has previously worked with the SequeL team of INRIA Lille, France and the RLAI group at the University of Alberta, Edmonton, Canada. He obtained his PhD degree in 2013 from the Technical University of Budapest, where his advisors were Andras Gyorgy, Csaba Szepesvari and Laszlo Gyorfi. His main research interests are in machine learning theory, including reinforcement learning and online learning with limited feedback and/or very large action sets.
Matteo Pirotta, Facebook AI Research (FAIR), France
Matteo Pirotta is a research scientist at Facebook AI Research (FAIR) lab in Paris. Previously, he was a postdoc at Inria in the SequeL team. He has been mainly working on the exploration-exploitation dilemma in reinforcement learning since he joined the SequeL team. He received his PhD in computer science from the Politecnico di Milano (Italy) in 2016.  For his doctoral thesis in reinforcement learning, he received the  Dimitris N. Chorafas Foundation Award and an honorable mention for the EurAI Distinguished Dissertation Award.


8:50-9:00 Opening
9:00-9:40 Invited talk 1 by Lucian Busoniu
Forward-search optimistic planning with convergence guarantees in continuous-action MDPs
We discuss an optimistic planning method to run a forward search for continuous-action sequences in deterministic MDPs. The method iteratively refines the most promising adaptive-horizon hyperboxes in the space of infinite-horizon action sequences. Under Lipschitz conditions on the dynamics and rewards, we obtain a convergence rate to the optimal solution as the simulation budget increases. Real-time, receding-horizon control results illustrate the method in practice.
9:40-10:00 Paper talk 1 by Guangxiang Zhu
Object-Oriented Dynamics Predictor
10:00-10:20 Coffee break
10:20-11:00 Invited talk 2 by Shane Shixiang Gu
Deep Reinforcement Learning Toward Robotics
Deep reinforcement learning (RL) has shown promising results for learning complex sequential decision-making behaviors in various environments from computer games, the game of Go, to simulated humanoids. However, most successes have been exclusively in simulation, and results in real-world applications such as robotics are limited, largely due to poor sample efficiency of typical deep RL algorithm and other challenges. In this talk, I present essential components for deep reinforcement learning in the wild. First, I will discuss methods improve performance and sample efficiency of the core RL algorithms, blurring the boundaries among classic model-based RL, off-policy and on-policy model-free RL. In the latter part, I illustrate other practical challenges for enabling autonomous learning agents in the real world, particularly that current RL formulations require constant human interventions for safety, resets, and reward engineering, and do not scale to learn diverse skills. I present our recent work to address those challenges and show pathways to achieve continually learning robots in the real world.
11:00-11:20 Paper talk 2 by Yue Wang
Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting
11:20-12:00 Invited talk 3 by Minlie Huang
Reinforcement Learning in Natural Language Processing and Search
Deep reinforcement learning has received much attention since the success of Alpha GO/Zero. In this talk, the speaker will present his research attempts on how deep reinforcement learning can be applied in solving natural language processing or search problems including discovering text structures, removing noise instances, correcting noisy data labels, and optimizing online, complex, and dynamic search systems. In these works, the speaker will demonstrate how typical NLP/Search problems can be formulated as sequential decision problems. These works share in common that under the setting of weak or indirect supervision, reinforcement learning performs well by leveraging two properties: the nature of trial-and-error with probabilistic exploration, and reward design that captures prior knowledge or domain expertise knowledge. The speaker will also share his experiences on how to make RL succeed in solving NLP/Search problems.


13:00-13:40 Invited talk 4 by Gergely Neu
A unified view of entropy-regularized Markov decision processes
Entropy regularization, while a standard technique in the online learning toolbox, has only been recently discovered by the reinforcement learning community: In recent years, numerous new reinforcement learning algorithms have been derived using this principle, largely independently of each other. So far, a general framework for these algorithms has remained elusive. In this work, we propose such a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations. This result enables us to formalize a number of state-of-the-art entropy-regularized reinforcement learning algorithms as approximate variants of Mirror Descent or Dual Averaging, and thus to argue about the convergence properties of these methods. In particular, we show that the exact version of the TRPO algorithm of Schulman et al. (2015) actually converges to the optimal policy, while the entropy-regularized policy gradient methods of Mnih et al. (2016) may fail to converge to a fixed point.
13:40-14:00 Paper talk 3 by Yan Zheng
A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents
14:00-14:40 Invited talk 5 by Matteo Pirotta
Exploration Bonus for Regret Minimization in Reinforcement Learning
A sample-efficient RL agent must trade off the exploration needed to collect information about the environment, and the exploitation of the experience gathered so far to gain as much reward as possible. A popular strategy to deal with the exploration-exploitation dilemma (i.e., minimize regret) is to follow the optimism in the face of uncertainty (OFU) principle.
We present SCAL+, an optimistic algorithm that relies on an exploration bonus to efficiently balance exploration and exploitation in the infinite-horizon undiscounted setting. We show that all the exploration bonuses that were previously introduced in the RL literature explicitly exploit some form of prior knowledge associated to the specific setting (i.e., discounted of finite-horizon problems). In the infinite-horizon undiscounted case, there is no predefined parameter playing such a role. This makes the design of an exploration bonus very challenging. To overcome this limitation, we make the common assumption that the agent knows an upper-bound c on the span of the optimal bias.
We discuss the connections between the different settings and we prove that SCAL+ achieves the same theoretical guarantees of standard approaches (e.g., UCRL), with a much smaller computational complexity.
14:40-15:00 Paper talk 4 by Hideaki Kano
Good Arm Identification via Bandit Feedback
15:00-15:20 Paper talk 5 by Siyuan Li
An Optimal Online Method of Selecting Source Policies for Reinforcement Learning
15:20-15:30 Close

Organization Committee

Paul Weng, University of Michigan-Shanghai Jiao Tong University Joint Institute, China
Yang Yu Nanjing University, China
Zongzhang Zhang, Soochow University, China
Li Zhao, Microsoft Research Asia, China


LAMDA Group, National Key Laboratory for Novel Software Technology, Nanjing University