The recent success of machine learning has been driven by advances in computer systems, and now it is time for a new era in which computer systems design is transformed through machine learning. This talk will focus on two of our recent works: Resource Allocation Optimization with Deep Reinforcement Learning (RL) and Dynamic Neural Networks with Sparsely Gated Mixture of Experts.
The first half of the talk covers our new RL-based techniques to solve combinatorial optimization problems in the context of graph resource allocation. We show that our approach performs model parallelism without explicit profiling of the target hardware or the computational graph. Instead, it solves the problem by considering only the reward function of interest (e.g., runtime) and finds solutions that outperform traditional white box baselines.
The second half of the talk covers our work on dynamic networks with sparse gates, where input examples are conditionally routed through the model. Dynamic networks allow us to performantly train models with much larger capacities. Combined with their implicit regularization properties, these models can yield significantly higher learning accuracies. Meanwhile, the sparse usage of the model architecture improves systems performance at both training and inference time, as shown by our state-of-the-art results on language modeling and machine translation benchmarks.