(Registration: Attendees should register to DAI. 参会注册：参会代表请在DAI注册)
The Asian Workshop on Reinforcement Learning (AWRL) focuses on both theoretical foundations, models, algorithms, and practical applications. We intend to make this an exciting event for researchers and practitioners in RL worldwide as a forum for the discussion of open problems, future research directions and application domains of RL.
AWRL 2019 (in conjunction with DAI 2019) will consist of academic and industrial keynote talks, and invited paper presentations, over a one-day period.
Invited Speakers (sorted in alphabetical order)
Yingfeng Chen, NetEase Fuxi AI Lab
Dr. Yingfeng Chen is a researcher in NetEase Fuxi AI Lab. His research interests include reinforcement learning and game AI. He received his doctor's degree in Computer Science from University of Science and Technology of China. At present, he is working on landing RL algorithms on Game AI and solving the critical technical problems.
Junqi Jin, Alibaba Group
Dr. Junqi Jin is from Precision Orientation Technology Department at Alibaba Group, where his main research interests lie in machine learning, mechanism design applied to online advertising and recommendation. Junqi holds a PhD (2016), a Bachelor of Engineering (2011) and a Bachelor of Economics (2011) from Tsinghua University. Junqi has published research papers in IEEE TPAMI, TITS, TNNLS, KDD and CIKM.
Wulong Liu, Huawei Noah's Ark Lab
Wulong Liu is a stuff researcher and reinforcement learning tech lead of the Decision Making and Reasoning Lab, Huawei Noah's Ark Lab. Before joined Huawei, he received his Ph.D. degree in electronic engineering from Tsinghua University in 2015 and received his Bachelor degree in Microelectronics from Xidian University in 2010. He was also a visiting scholar in C.S.E Department of Penn State University from 2013 to 2014. His research interests mainly includes deep learning, reinforcement learning, distributed computing, and optimization. His recent research aims at applying reinforcement learning in autonomous driving.
Peng Sun, Tencent AI Lab
Peng Sun got his BS, MS, and PhD from Wuhan University of Technology, Beijing University of Posts and Telecommunication and Tsinghua University, respectively. He then did Postdoc research with Cornell University and Rutgers University. He is now a senior researcher with Tencent AI Lab. His current interests include Deep Reinforcement Learning and its application in Video Game AI and Robotics.
Paul Weng, UM-SJTU Joint Institute
Paul Weng is currently a tenure-track assistant professor at UM-SJTU Joint Institute. Previously, he was a faculty at SYSU-CMU Joint Institute of Engineering from 2015 to 2017. During 2015, he was a visiting faculty at Carnegie Mellon University (CMU). Before that, he was an associate professor in computer science at Sorbonne University (Pierre and Marie Curie University, UPMC), Paris. He received his Master in 2003 and his Ph.D. in 2006, both at UPMC. Before joining academia, he graduated from ENSAI (French National School in Statistics and Information Analysis) and worked as a financial quantitative analyst in London. His main research work lies in artificial intelligence and machine learning. Notably, it focuses on adaptive control (reinforcement learning, Markov decision process) and multiobjective optimization (compromise programming, fair optimization).
Chongjie Zhang , Institute for Interdisciplinary Information Sciences at Tsinghua University
Chongjie Zhang is an Assistant Professor in the Institute for Interdisciplinary Information Sciences at Tsinghua University. Before joining the faculty, he was a postdoctoral associate in the Computer Science and Artificial Intelligence Lab (CSAIL) at MIT. He received his Ph.D. in Computer Science from University of Massachusetts at Amherst in 2011. His research interests span reinforcement learning, multi-agent systems, and robotics.
Weinan Zhang, Shanghai Jiao Tong University
Weinan Zhang is now a tenure-track assistant professor in Department of Computer Science and John Hopcroft Center for Computer Science, Shanghai Jiao Tong University. His research interests include deep reinforcement learning, unsupervised learning and the applications on various big data mining scenarios. Weinan earned his Ph.D. from University College London in 2016 and B.Eng. from ACM Class of Shanghai Jiao Tong University in 2011. He was selected as one of the 20 rising stars of KDD research community in 2016 by Microsoft Research and won the ACM rising star award (Shanghai chapter) and DAMO young scholar award in 2018. His papers won the best paper honorable mention award in SIGIR 2017 and the best paper award in DLP-KDD 2019.
Li Zhao, Microsoft Research Asia (MSRA)
Li Zhao is currently a Senior Researcher in Machine Learning Group, Microsoft Research Asia (MSRA). She obtained her PhD degree majoring in Computer Science in July, 2016, from Tsinghua University, supervised by Professor Xiaoyan Zhu. Her research interests mainly lie in deep learning and reinforcement learning, and their applications for text mining, finance, game and operations research.
||Invited talk 1 by Paul Weng
Exploiting the Sign of the Advantage Function to Learn Deterministic Policies in Continuous Domains
In the context of learning deterministic policies in continuous domains, we revisit an approach, which was first proposed in Continuous Actor Critic Learning Automaton (CACLA) and later extended in Neural Fitted Actor Critic (NFAC). This approach is based on a policy update different from that of deterministic policy gradient (DPG). Previous work has observed its excellent performance empirically, but a theoretical justification is lacking. To fill this gap, we provide a theoretical explanation to motivate this unorthodox policy update by relating it to another update and making explicit the objective function of the latter. We furthermore discuss in depth the properties of these updates to get a deeper understanding of the overall approach. In addition, we extend it and propose a new trust region algorithm, Penalized NFAC (PeNFAC). Finally, we experimentally demonstrate in several classic control problems that it surpasses the state-of-the-art algorithms to learn deterministic policies.
||Invited talk 2 by Li Zhao
Fully Parameterized Quantile Function for Distributional Reinforcement Learning
Distributional Reinforcement Learning (RL) differs from traditional RL in that, rather than the expectation of total returns, it estimates distributions and has achieved state-of-the-art performance on Atari Games. The key challenge in practical distributional RL algorithms lies in how to parameterize estimated distributions so as to better approximate the true continuous distribution. Existing distributional RL algorithms parameterize either the probability side or the return value side of the distribution, leaving the other side uniformly ﬁxed as in C51, QR-DQN or randomly sampled as in IQN. In this paper, we propose fully parameterized quantile function that parameterizes both the probability side and the value side for distributional RL. Our algorithm contains a probability proposal network that generates a discrete set of probabilities and a quantile network that gives corresponding quantile values. The two networks are jointly trained to better approximate the true distribution. Experiments on 55 Atari Games show that our algorithm signiﬁcantly outperforms existing distributional RL algorithms and creates a new record for the Atari Learning Environment
||Invited talk 3 by Weinan Zhang
Towards Efficient Many-Agent Reinforcement Learning: Algorithms and Platforms
Multi-agent reinforcement learning (MARL) is obtaining more and more attention from both academia and industry. In this talk, I will briefly introduce stochastic games and some fundamental multi-agent reinforcement learning algorithms. Then I will focus on the scenarios with a huge number of agents, where the traditional methods will probably fail to work, and introduce two preliminary solutions, namely mean-field MARL and factorized Q-learning. Finally, I will present serveral advanced open-source MARL platforms for the research of MARL, including MAgent and CityFlow, and some experimental results and key findings on them.
||Invited talk 4 by Junqi Jin
Learning to Advertise for Joint Optimization with Recommendation in E-Commerce Platform
As one of the world's largest e-commerce platform, Taobao's online advertising and recommendation serve hundreds of millions of consumers of retail market (BABA June Quarter 2019 Results). In product feeds, blended advertisements and recommendations are presented to consumers simultaneously and have close relationship and interaction in both front-end display and back-end shared dependent data. Traditional e-commerce platform optimizes advertising and recommendation independently and the neglect of their interplay usually leads to sub-optimality. To tackle this problem, we propose advertising strategies for joint optimization in two aspects. First, traditional ads and recommendations at fixed positions with fixed numbers in front-end display weaken the flexibility of product feeds. In this talk, we share our advertising optimization method to dynamically determine the number and positions of ads for each consumer visit. Second, consumers’ behaviors on the advertised results constitute part of the recommendation model's back-end training data and therefore can influence the recommended results. Considering this mechanism, we propose a novel perspective that advertisers can strategically control the advertising platform to optimize their recommended organic traffic. In both aspects, we introduce our algorithm designs and corresponding optimization techniques to improve learning speed and stability. Offline evaluations and online deployment results in real-world industrial environment demonstrate the effectiveness of our approaches.
||Invited talk 5 by Peng Sun
RL Research at Tencent AI Lab and RoboticsX with Applications in Video Game AI and Robotics
In this talk we'll discuss the RL research at Tencent AI Lab and RoboticsX, including imitation learning, multi-agent control, distributed RL training, game-theoretic RL, virtual-to-real transfer learning, etc. We'll also cover their applications in video game AI (Honor of Kings, StarCraftII) and robotics (active tracking, robotic arm).
||Invited talk 6 by Wulong Liu
Reinforcement Learning: From Virtual Games to Self-driving Cars
Autonomous driving will be revolutionary by saving millions of lives every year from traffic accidents, which will also lead to a multi-trillion dollar industry. But there are still lots of technical challenges before making this all possible. Among that, we think reinforcement learning provides a potential way to make more intelligent driving policy like our human driver. This talk will try to introduce the current status of autonomous driving and based on some previous work to illustrate the remaining challenges to apply reinforcement learning into autonomous driving tasks.
||Invited talk 7 by Yingfeng Chen
Landing Reinforcement Learning on Games Field
Recently, reinforcement learning has shown its great potential and progress rapidly in many areas. In this report, we focus on the RL application in game field, specifically, creating better game AI and making intelligent game testing. The solutions and some research results of our attempts on several commercial Netease games will be presented, including hands-on project experience, generating different style (or difficult level) game AI, class balancing and so on. Meanwhile, the problems and challenges of our exploratory trial will be also discussed. Besides, we will make a brief introduction of our developing Game AI Design Toolkit, which makes it possible for nontechnical game AI designer to take advantage of the RL technique in game filed.
||Invited talk 8 by Chongjie Zhang
Towards Efficient Reinforcement Learning
Deep reinforcement learning (DRL) has recently shown considerable successes of achieving human-level control or decision making in a series of artificial domains. However, DRL is not efficient yet for many real-world problems, requiring vast experiences to learn effective policies. To address this challenge, in this talk I will first present model-free policy reuse approaches that transfer source policies to new tasks with theoretical optimality guarantees, and then also discuss object-oriented model-based approaches for enabling DRL generalization over unseen environments.
Chao Yu, Dalian University of Technology, China
Jianye Hao, Tianjing University, China
Yang Yu, Nanjing University, China
LAMDA Group, National Key Laboratory for Novel Software Technology, Nanjing University