We study the stochastic gradient descent (SGD) method with the step size being in an interval instead of being given by a ﬁxed formula. The scheme is intuitively inspired by the piece-wise decay step size, which can be regarded to lie within a band. In such scenario, we investigate the optimality conditions of SGD related to the step size and provide an insight to enlarge the step size in the initial iterations. Our analysis explicitly provides theoretical error bounds for step sizes in most general cases. The convergence rates for many other step size strategies, which are small perturbed within the band and possibly non-monotonic, are guaranteed by our analysis. Furthermore, we discuss the boundaries of the step size being in diﬀerent orders and explore the possibilities of extending the boundaries to attain the convergence properties of SGD. Finally, we propose a type of non-monotonic step size and provide empirical evidence to show that the the small perturbation in step size really helps.
Joint work with X.Y. Wang