On Neural Networks and Gradient Descent
The way I see it, the two most important features of Neural Networks that make them so powerful are 1. Differentiability and 2. Compositionality. Differentiability enables optimization using gradient descent, which is orders of magnitude faster than most other numerical optimization methods. Compositionality, on the other hand, means that we can make use of the chain rule for differentiation, and break down potentially unwieldy functions into small manageable units that we can handle one at the time. However, in most cases the Neural Network architecture itself needs to be *designed*. Optimizing this design is completely nontrivial, and is in fact where most of the Neural Network research is focused.
Gradient boosting algorithms, on the other hand, combine the boosting ensembling method with gradient descent, and build the entire ML algorithm “architecture” using the differential methodology, with very minimal adjustment of hyperparameters. IMHO, from the purely conceptual standpoint, this approach is very appealing. Due to its conceptual simplicity, though, it would seem there is not much of an improvement that we can do with the basic algorithm. But I am convinced that we are only scratching the surface of what Gradient Boosting approach can do. I believe that just a few simple improvements could make gradient boosting even more powerful than it currently is.