I started PhD in Machine Learning under Cambridge-Tübingen PhD Fellowship in the fall 2014, where I am co-supervised by Richard E. Turner and Zoubin Ghahramani at University of Cambridge, and Bernhard Schölkopf at the Max Planck Institute for Intelligent Systems in Tübingen. I also collaborate closely with Sergey Levine at UC Berkeley/Google Brain and Timothy Lillicrap at DeepMind. I completed my B.ASc. in Engineering Science from the University of Toronto, where I did my thesis with Prof. Geoffrey Hinton in distributed training of neural networks using evolutionary algorithms. I also had a great fortune and fun time working with Prof. Steve Mann, developing real-time HDR capture for wearable cameras/displays. I previously interned at Google Brain hosted by Ilya Sutskever and Vincent Vanhoucke. My PhD is funded by NSERC and Google Focused Research Award on Reliable and Robust Deep Reinforcement Learning. I am a member of Jesus College, Cambridge. I am a Lab Scientist at Creative Destruction Lab, one of the leading tech-startup incubators in Canada. I am looking into machine learning involving sequential processing, such as reinforcement learning and sequence prediction. I currently focus on learning-driven approaches for robotics, which have been covered by Google Research Blogpost and MIT Technology Review. I also work on deep learning, probabilistic models, and generative models.
Personal Homepagereinforcement learning deep learning robotics approximate inference Bayesian methods
ei
Nachum, O., Gu, S., Lee, H., Levine, S.
Data-Efficient Hierarchical Reinforcement Learning
Advances in Neural Information Processing Systems 31, pages: 3307-3317, (Editors: S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett), Curran Associates, Inc., 32th Annual Conference on Neural Information Processing Systems, December 2018 (conference)
ei
Tucker, G., Bhupatiraju, S., Gu, S., Turner, R., Ghahramani, Z., Levine, S.
The Mirage of Action-Dependent Baselines in Reinforcement Learning
Proceedings of the 35th International Conference on Machine Learning (ICML), 80, pages: 5022-5031, Proceedings of Machine Learning Research, (Editors: Dy, Jennifer and Krause, Andreas), PMLR, July 2018 (conference)
ei
Pong*, V., Gu*, S., Dalal, M., Levine, S.
Temporal Difference Models: Model-Free Deep RL for Model-Based Control
6th International Conference on Learning Representations (ICLR), May 2018, *equal contribution (conference)
ei
Eysenbach, B., Gu, S., Ibarz, J., Levine, S.
Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
6th International Conference on Learning Representations (ICLR), May 2018 (conference)
ei
Gu, S., Lillicrap, T., Turner, R. E., Ghahramani, Z., Schölkopf, B., Levine, S.
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
Advances in Neural Information Processing Systems 30, pages: 3849-3858, (Editors: Guyon I. and Luxburg U.v. and Bengio S. and Wallach H. and Fergus R. and Vishwanathan S. and Garnett R.), Curran Associates, Inc., 31st Annual Conference on Neural Information Processing Systems, December 2017 (conference)
ei
Jaques, N., Gu, S., Bahdanau, D., Hernández-Lobato, J. M., Turner, R. E., Eck, D.
Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control
Proceedings of the 34th International Conference on Machine Learning, 70, pages: 1645-1654, Proceedings of Machine Learning Research, (Editors: Doina Precup, Yee Whye Teh), PMLR, International Conference on Machine Learning (ICML), August 2017 (conference)
ei
Gu*, S., Holly*, E., Lillicrap, T., Levine, S.
Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates
Proceedings 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017, *equal contribution (conference)
ei
Gu, S., Lillicrap, T., Ghahramani, Z., Turner, R. E., Levine, S.
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
Proceedings International Conference on Learning Representations (ICLR), OpenReviews.net, International Conference on Learning Representations, April 2017 (conference)
ei
Jang, E., Gu, S., Poole, B.
Categorical Reparametrization with Gumbel-Softmax
Proceedings International Conference on Learning Representations 2017, OpenReviews.net, International Conference on Learning Representations, April 2017 (conference)
ei
Gu, S., Lillicrap, T., Sutskever, I., Levine, S.
Continuous Deep Q-Learning with Model-based Acceleration
Proceedings of the 33nd International Conference on Machine Learning (ICML), 48, pages: 2829-2838, JMLR Workshop and Conference Proceedings, (Editors: Maria-Florina Balcan and Kilian Q. Weinberger), JMLR.org, June 2016 (conference)
ei
Gu, S., Levine, S., Sutskever, I., Mnih, A.
MuProp: Unbiased Backpropagation for Stochastic Neural Networks
4th International Conference on Learning Representations (ICLR), May 2016 (conference)
ei
Gu, S., Ghahramani, Z., Turner, R. E.
Neural Adaptive Sequential Monte Carlo
Advances in Neural Information Processing Systems 28, pages: 2629-2637, (Editors: Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett), 29th Annual Conference on Neural Information Processing Systems (NIPS), 2015 (conference)
ei
Tripuraneni*, N., Gu*, S., Ge, H., Ghahramani, Z.
Particle Gibbs for Infinite Hidden Markov Models
Advances in Neural Information Processing Systems 28, pages: 2395-2403, (Editors: Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett), 29th Annual Conference on Neural Information Processing Systems (NIPS), 2015, *equal contribution (conference)