Deep learning is machine learning using neural networks with many hidden layers, and it has become a primary tool in a wide variety of practical learning tasks, such as image classification, speech recognition, driverless cars, or game intelligence. This work introduces the mathematical formulation of deep residual neural networks as a PDE optimal control problem. We study the wellposedness, the large time solution behavior, and the characterization of the steady states for the forward problem. We state and prove optimality conditions for the inverse deep learning problem, using the Hamilton-Jacobi-Bellmann equation and the Pontryagin maximum principle. This serves to establish a mathematical foundation for investigating the algorithmic and theoretical connections between optimal control and deep learning.