For many large-scale optimization and machine learning problems, first-order methods and their accelerated variants based on momentum have been a leading approach for computing low-to-medium accuracy solutions because of their cheap iterations and mild dependence on the problem dimension and data size. Even though momentum-based accelerated gradient (AG) methods proposed by Nesterov for convex optimization converges provably faster than gradient descent (GD) in the absence of noise, the comparison is no longer clear in the presence of gradient noise. In this talk, we focus on GD and AG methods for minimizing strongly convex functions when the gradient has random noise. We study the trade-offs between convergence rate and robustness to gradient errors in designing a first-order algorithm. AG can achieve acceleration while being more robust to random gradient errors, and the framework leads to practical optimal algorithms that can perform better than other state-of-the-art methods in the presence of random gradient noise. We also study distributed GD and AG, where the objective function is distributed over several computational units, lying on a fixed but arbitrary connected communication graph, subject to local communication constraints where noise estimates of the gradients are available. This is based on joint works with Bugra Can, Alireza Fallah, Mert Gurbuzbalaban, Umut Simsekli and Asu Ozdaglar.