• 0 Posts
  • 25 Comments
Joined 2 years ago
cake
Cake day: July 1st, 2023

help-circle


  • No, you are correct. Hinton began researching ReLUs in 2010 and his students Alex Krizhevsky and Ilya Sutskever used it to train a much deeper network (AlexNet) to win the 2012 ILSVRC. The reason AlexNet was so groundbreaking was because it brought all of the gradient optimization improvements (SGD with momentum as popularized by Schmidhuber, and dropout), better activation functions (ReLU), a deeper network (8 layers), supervised training on very large datasets (necessary to learn good general-purpose convolutional kernels), and GPU acceleration into a single approach.

    NNs, and specifically CNNs, won out because they were able to create more expressive and superior image feature representations over the hand-crafted features of competing algorithms. The proof was in the vastly better performance, it was a major jump when the performance on the ILSVRC was becoming saturated. Nobody was making nearly +10% improvements on that challenge back then, it blew everybody out of the water and made NNs and deep learning impossible to ignore.

    Edit: to accentuate the point about datasets and GPUs, the original AlexNet developers really struggled to train their model on the GPUs available at the time. The model was too big and they had to split it across two GPUs to make it work. They were some of the first researchers to train large CNNs with GPUs. Without large datasets like the ILSVRC they would not have been able to train good deep hierarchical convolutions, and without better GPUs they wouldn’t have been able to make AlexNet sufficiently large or deep. Training AlexNet on CPU only for ILSVRC was out of the question, it would have taken months of full-tilt, nonstop compute for a single training run. It was more than these two things, as detailed above, but removing those two barriers really allowed CNNs and deep learning to take off. Much of the underlying NN and optimization theory had been around for decades.


  • Before AlexNet, SVMs were the best algorithms around. LeNet was the only comparable success case for NNs back then, and it was largely seen as exclusively limited to MNIST digits because deep networks were too hard to train. People used HOG+SVM, SIFT, SURF, ORB, older Haar / Viola-Jones features, template matching, random forests, Hough Transforms, sliding windows, deformable parts models… so many techniques that were made obsolete once the first deep networks became viable.

    The problem is your schooling was correct at the time, but the march of research progress eventually saw 1) the creation of large, million-scale supervised datasets (ImageNet) and 2) larger / faster GPUs with more on-card memory.

    It was fact back in ~2010 that SVMs were superior to NNs in nearly every aspect.

    Source: started a PhD on computer vision in 2012










  • bluemellophone@lemmy.worldtoScience Memes@mander.xyz8 Minutes
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    1
    ·
    edit-2
    1 year ago

    It takes 8 minutes for the light to travel from the sun to Earth. Because light in a vacuum travels faster than anything, including information, we would not and could not know it had disappeared for 8 minutes. This means Earth would continue to follow its orbit around a non-existent sun for 8 minutes because the Sun’s gravity would still be acting on the Earth.

    If it was nighttime, you wouldn’t notice the sudden lack of sunlight (other than if it was a full moon) but you’d almost certainly notice the change in gravity.

    Edit: actually, you wouldn’t feel any difference in gravity or experience any change of acceleration. What you would experience is a very tiny vibration, of 1 million push notifications being sent to your phone from the other side of the planet.




  • Reinforcement learning is a machine learning (ML) technique (“AI” in layman terms) for optimizing neural networks and other types of non-linear models.

    As far as ML math goes, this is fairly tame. It looks complicated, but is spelled out clearly in the paper. A lot of these kind of theoretical papers — things that would get published in Automatica — are going to lean very heavy on math.

    Source: PhD in Computer Science with dissertation using neural networks.