SoloGen

Machine Learning-related surfings of SoloGen

Jul 10

End to End Learning for Self-Driving Cars and the Distribution Mismatch Problem

PAPER:

I recently came across this interesting paper by NVIDIA autonomous driving team:

SUMMARY

They take a supervised learning approach to learn a mapping from the image input to the steering command. It is essentially a modern (mid 2010s) version of ALVINN from late 1980s. The function approximator is a convolutional neural network (a normalization + 5 convolutional + 3 fully connected). They use a lot of collected data based on actual driver’s behaviour to train their network (about 70 hours, corresponding to about 2.5M data samples — not explicitly mentioned) and some data augmentation.

COMMENTS:

It is exciting to see an end-to-end neural network learned how to perform relatively well. But there are potential problems: One challenging problem with such a classical supervised learning approach is due to the distribution mismatch caused by the dynamical nature of the agent-environment interaction: When an agent makes a mistake at each time step, the distribution of the future states slightly changes compared to the distribution of the optimal agent (or the driver, for this case). This has a compounding effect and the difference in distributions can potentially grow as the agent makes more interactions with the environment. As a result, as time passes, the agent is more likely to be in regions of the state space from which it doesn’t have much training data. So the agent starts behaving in ways that are not predictable even though it might perform well on the training distribution (this is the distribution mismatch problem in machine learning/statistics).

A solution to this problem is to use DAGGER-like algorithms:

Aside the aforementioned work, which analyzes the phenomenon in the imitation learning context, the analysis of how the distribution of the agent’s changes, in the context of reinforcement learning, has been done by several researchers, including myself. I only refer to two papers. See their references for further information.


Nov 18

Dec 29

Dec 11

Large deviations for the local fluctuations of random walks and new insights into the “randomness” of Pi http://arxiv.org/abs/1004.3713v2


Dec 10

Active Learning Halfspaces under Margin Assumptions http://arxiv.org/abs/1112.1556v1


Predictors for time series with energy decay on higher frequencies http://arxiv.org/abs/1112.1478v1


Dec 7

Information-Theoretically Optimal Compressed Sensing via Spatial Coupling and Approximate Message Passing http://arxiv.org/abs/1112.0708v1


Dec 6

On the question of effective sample size in network modeling http://arxiv.org/abs/1112.0840v1


Dimension adaptability of Gaussian process models with variable selection and projection

http://arxiv.org/abs/1112.0716v1


Multi-stage Convex Relaxation for Feature Selection http://arxiv.org/abs/1106.0565v2


Page 1 of 13