Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris. Jeremie helps run a data science mentorship startup called SharpestMinds . You can listen to the podcast below:
One of the most interesting recent trends in machine learning has been the combination of different types of data in order to be able to unlock new use cases for deep learning. If the 2010s were the decade of computer vision and voice recognition, the 2020s may very well be the decade we finally figure out how to make machines that can see and hear the world around them, making them that much more context-aware and potentially even humanlike.
The push towards integrating diverse data sources has received a lot of attention, from academics as well as companies. And one of those companies is Twenty Billion Neurons, and its founder Roland Memisevic, is our guest for this latest episode of the Towards Data Science podcast. Roland is a former academic who’s been knee-deep in deep learning since well before the hype that was sparked by AlexNet in 2012. His company has been working on deep learning-powered developer tools, as well as an automated fitness coach that combines video and audio data to keep users engaged throughout their workout routines.
Here were some of my favourite take-homes from today’s episode:
- Academics who started down the deep learning path prior to 2012 were often ridiculed. The world of the 2000s was dominated by tabular data that simple models like decision trees and support vector machines were well suited for, so most people incorrectly generalized from this and assumed that the tools of classical, statistical machine learning were more promising than neural networks. What kept deep learning buffs moving despite all that pushback was the belief that deep learning should have the potential to process a type of information that humans consume all the time, but that machines rarely encountered, especially back then: video and audio data.
- The computational constraints imposed by mobile devices are a big consideration for companies that are developing new consumer-facing applications for machine learning. When Twenty Billion Neurons got started, mobile devices couldn’t handle the on-device machine learning capabilities that they needed if they were going to run their automated fitness trainer software, so they were faced with a choice: find a way to compress their models so that they could be run on-device, or wait for the hardware to catch up with their software. Ultimately, Twenty Billion went with option 2, and that paid off: in 2018 Apple phones started carrying a chip that unlocked the on-device processing they needed.
- If you’re interested in experimenting with datasets that contain multiple data types, Roland recommends checking out the “something something” dataset, publicly available from here .
You can follow Twenty Billion Neurons on Twitter here or on LinkedIn here and you can follow me on Twitter here .
If you’re curious about their upcoming fitness app launch, you can also give them a follow on Instagram here .
以上所述就是小编给大家介绍的《Machines that can see and hear》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。