Hogwild! (Test-of-Time Award Talk NeurIPS 2020)

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 ธ.ค. 2020
  • In machine learning, we use data to train a computational model to make good predictions. For example, by translating sounds into written words or by recognizing objects in images. Typically, the computational model is defined by a large number of parameters, which undergo a process of adjustment, step by step, arriving ultimately at a set of parameters that do a good job of predicting the correct outcomes on the training data. The algorithm that defines the parameter steps is known as stochastic gradient descent (SGD). SGD bases each parameter step on improving performance on just a single item of the training data (or perhaps a small subset of items).
    SGD was conceived and originally implemented as a serial algorithm running on a single computer processor. But the high computational demands of machine learning demanded the power of parallel computation. In the first parallel versions of SGD, different processors examined different items of training data simultaneously but required locking of access to the parameter storage locations, so that parameters were updated in a systematic way. This requirement degraded greatly the speedup available by using parallel computers.
    Hogwild! arose from our observation that the locking was unnecessary! Even though removal of the locks could lead to clashing updates to the parameters and the occasional loss of information, the parallel SGD algorithm ran much more efficiently without them. The Hogwild! paper describes our computational experience with this lock-free, asynchronous implementation and compares it to several more conventional parallel alternatives that had been proposed at that time. Moreover, the paper provided mathematical analysis to explain why good convergence could still be expected, showing that for large dimension of the parameter space, many processors can be engaged in the computation without causing degradation in the overall performance.
    We believe that the Hogwild! paper was influential for two major reasons. First, it showed that the unconventional, even antithetical approach of ignoring locked memory access could yield enormous improvements in parallelism and efficiency in data science computations. Second, it demonstrated that a mathematical analysis of the resulting asynchronous algorithm was possible. Many subsequent works (both our group and others) have built on these ideas in the 9 years since the paper appeared.
    Link to paper: papers.nips.cc/paper/2011/fil...

ความคิดเห็น •