Deep Learning's Most Important Ideas: A Brief Historical Review

This story reviews well-adopted ideas that have stood the test of time, presenting a small set of techniques that covers a lot of basic knowledge necessary to understand modern deep learning research.

August 2, 2020

Data Science and Digital Engineering

Deep learning is an extremely fast-moving field, and the huge number of research papers and ideas can be overwhelming. Even seasoned researchers have a hard time telling company PR from real breakthroughs. The goal of this post is to review those ideas that have stood the test of time, which is perhaps the only significance test one should rely on. These ideas, or improvements of them, have been used over and over again. They're known to work.

If you were to start in deep learning today, understanding and implementing each of these techniques would give you an excellent foundation for understanding recent research and working on your own projects. Working through papers in historical order is also a useful exercise to understand where the current techniques come from and why they were invented in the first place.

An interesting factor of deep learning is that its application domains (e.g., vision, natural language, speech) share the majority of techniques. For example, someone who has worked in deep learning for computer vision his whole career could quickly be productive in NLP research. The specific network architectures may differ, but the concepts, approaches, and code are mostly the same.

The goal here is not to give in-depth explanations or code examples for these techniques. It's not easily possible to summarize long complex papers into a single paragraph. Instead, here is provided a brief overview of each technique, its historical context, and links to papers and implementations.

The list sticks to what most people would consider the popular mainstream domains of vision, natural language, speech, and reinforcement learning/games.

This story only presents research that has official or semiofficial open-source implementations that are known to work well. Some research isn't easily reproducible because it involves huge engineering challenges, for example DeepMind's AlphaGo or OpenAI's Dota 2 AI, so that won't be highlighted here.

Some choices are arbitrary. Often, rather similar techniques are published around the same time. The goal of this post is not be a comprehensive review, but to to expose someone new to the field to a cross-section of ideas that cover a lot of ground. For example, there may be hundreds of GAN variations, but to understand the general concept of GANs, it really doesn't matter which one you study.

Read the full story here.