Whether it’s Google’s headline-grabbing DeepMind AlphaGo victory, or Apple’s weaving of “using deep neural network technology” into iOS 10, deep learning and artificial intelligence are all the rage these days, promising to take applications to new heights in how they interact with us mere mortals.
To go deeper (yes, I went there) on the subject, I reached out to the team at the deep learning-focused company Skymind, creators of Deep Learning For Java (DL4J), and authors of the recently released O’Reilly book Deep Learning: A Practitioner’s Approach, Josh Patterson and Adam Gibson. Josh and Adam offer us a gentle introduction to the subject in this interview, as well as insight into how they are building an open source-based business around deep learning.
For the uninitiated, what is deep learning (DL) and why should I care about it?
Adam Gibson (AG): Deep learning is just another term for neural networks, a set of algorithms that have been around for decades. For a long time people were skeptical about them, but as chips got more powerful and as we gathered more data to train them on, deep neural nets started breaking records. We’re hitting expert human accuracy on a lot of problem sets, with accuracy rates in the high 90s, which is a quantum leap over other algorithms. So if you have a problem that matters to your business, you can probably attach a dollar value to that improvement in accuracy, and if you’re a large business, that value can be huge. It’s a competitive edge with a big impact on margins.
It’s a competitive edge with a big impact on margins.
Josh Patterson (JP): To build on what Adam said, with deep learning we’re moving from manual feature creation to automated feature learning. The trick with deep learning is to recognize the input data type and match it to the correct deep network architecture to enable robust automated feature learning. An example is how convolutional neural networks (CNN) automatically learn the features in complex image data, where historically this was harder for other machine learning methods.
What problems are DL best suited for? What are typical use cases for Skymind?
AG: Deep neural nets can classify, cluster, and make predictions about data. You can apply them to unstructured data like voice or images, which is what you see in the news with self-driving cars and AlphaGo and Alexa. But they’re also really useful for structured data like transactions and web activity, especially when you’re looking for patterns over time. That can apply to fraud detection, recommender systems, customer churn prediction, or market forecasting. That’s where deep learning excels. So a lot of the old, hard problems that businesses face are going to be transformed by applying deep learning.
JP: The ones I see the most are a mix of hard use cases from the big data wave of problems, and then problems that have emerged from research into the enterprise due to more accurate models making them viable in production. An example of the big data use cases would be anything that uses transactional data, something we typically see stored in Hadoop. DL4J runs natively on Spark so it can easily and securely build models with long short-term memory (LSTM) recurrent neural networks for transactional sensor data. Newer use cases include advanced image modeling with CNNs where we can help enterprises analyze the objects in a scene, which has applications in domains such as retail.
Tell our audience about Deep Learning 4 Java (DL4J). What features does it have and how does it compare to TensorFlow and/or other noteworthy DL frameworks?
AG: The first thing to note is that Deeplearning4j is backed by commercial support. Other frameworks don’t have a company signing service-level agreements to guarantee their performance. The rest are just libraries, and if your mission-critical app breaks, well, good luck. Skymind gives you a phone number to call if you use Deeplearning4j. It’s the only framework designed with the enterprise in mind.
Secondly, Deeplearning4j focuses on Java and Scala, and has integrations with the Java virtual machine (JVM) stack like Hadoop, Spark, Akka, and Kafka. The other libraries are either Python or Lua, and they don’t do deployment to production well without heavy customization. Deeplearning4j is bundled in an enterprise distribution called the Skymind Intelligence Layer, or SKIL. SKIL is dockerized and runs on top of datacenter operating system (DCOS) and Mesos. So it’s platform agnostic and comes with resource management. That’s unique. A lot of cloud vendors are designing libraries that aren’t platform neutral, so you would be looking at lock-in and switching costs. SKIL includes a microservices approach to deployment where you can autoscale our inference models elastically to handle heavy traffic.
Finally, Deeplearning4j includes deep reinforcement learning as well as neural nets. That’s the kind of goal-oriented algorithm that beat the Go champion this year.
JP: The thing that the Fortune 500 needs with respect to deep learning is a way to democratize the power of deep learning and use it in the same way they see shops like Facebook and Google using it. We saw this with Hadoop and big data where the Fortune 500 wanted to use similar tech as say Yahoo, but they needed a version that was compliant with the way they ran their data centers. This gave rise to Hadoop distributions such as Cloudera Distribution Including Apache Hadoop (CDH) and Hortonworks Data Platform (HDP). In a similar fashion, we’re seeing DL4J fill this role and be something that an IT department can easily run securely, while still being able to build advanced models with Spark and graphics processing units (GPUs) on top of the Hadoop investment they’ve already made.
What kind of background do I need in order to work on DL? How do I get started? Should I learn non-DL based machine learning first?
AG: We see a shift in the skills necessary to build machine learning solutions. Traditional algorithms like random forests or gradient boosting machines (GBMs) can require a lot of feature engineering. That is, you need feature engineers who are domain experts to tell the algorithms what to look for. There are not enough of those engineers, so that’s a real choke point to making machine learning more widespread. Also, there’s not much point learning how to do feature engineering on algorithms that are no longer state of the art.
Deep learning is different because neural nets extract features automatically; you don’t have to tell them what to look for. The necessary skills now involve tuning the hyperparameters of those nets. There are a lot of best practices out there. We teach people how to do that with the book and in our workshops, and many pick it up by joining our open source community.
JP: Deep learning requires some investment in the basics of statistics and linear algebra. However, with a solid progression of skills the onboarding of someone who is simply interested towards being a basic practitioner is possible. To that extent, Adam and I wrote a book with O’Reilly, Deep Learning, A Practitioner’s Approach, specifically geared towards anyone who would like to take on this journey.
To fulfill the massive market need for more advanced and intelligent applications we need to further democratize the concepts of deep learning. We feel like this book is a solid choice for the practitioner to work through the progressions needed to get familiar with the concepts in deep learning.
What can I expect to learn from your book? Who is your target audience?
AG: On the one hand, our target audience includes people starting out with deep learning: they might be data engineers and architects, Java systems engineers or business people who want grasp the mechanics and understand what it can be applied to. On the other, we’ve written a book that will teach deep learning specialists how they can deploy to production by scaling out neural nets with JVM tools. And they might learn a few other tricks, too.
The market for machine learning and deep learning applications is in some ways similar to how HTML and the web expanded into basically everything in the 1990s. The book is designed to be applicable to all levels, from someone with basic java experience all the way up to a PhD researcher who just needs a good set of chapters on neural network tuning tricks. The newer users will want to start at chapter 1 and read the book end-to-end, where an advanced user might only want to look at the two tuning chapters. Neural network and deep network tuning is not restricted to only DL4J, and those chapters are applicable to any deep learning framework. We also spend time talking about extract, transform, load (ETL) and techniques of vectorization which are important in the practical workflow of real-world machine learning modeling. We end the book with a chapter on Spark, and show how the code for DL4J moves to Spark with few changes, and then look at a few examples. There are over 10 appendix chapters on topics ranging from “What is Artificial Intelligence?” to a primer on reinforcement learning.
How did Skymind get started, and how does it approach doing business built on an open source project like DL4J?
AG: Skymind got started in early 2014. My co-founder Chris and I thought that enterprise needed an open source artificial intelligence (AI) layer, just like it had open source layers for big data storage with Hadoop, or Linux for the operating system. It seemed like an AI layer had the potential to create even more value than those. Josh helped us see that. So with him we created Deeplearning4j, and since then it’s become the biggest deep learning framework for the JVM. We’re following the typical open core playbook: Skymind does support, training and services for our enterprise distribution, the Skymind Intelligence Layer. Every open source business draws a line somewhere, and SKIL bundles a couple closed-source packages as well. We help big companies build deep learning solutions with a distro that deploys easily to the stack they have. And since it’s Java, they can leverage their existing teams to use it.
JP: I’d like to think we’ve found an interesting spot between what’s fashionable in machine learning today, and what’s reasonable for the Fortune 500 IT-Department to run in production.
I’ve heard DL requires really high-end hardware and lots of GPUs, what does a typical deployment look like?
AG: We have clients who only have central processing units (CPUs), and we have clients who have GPUs. The GPUs are really helpful for image processing, and more generally for the training phase of deep learning. With neural nets, first you train them, and then you use them to make inferences about data. The training phase is computationally intensive, so if you want to get a trained model quickly, you should consider GPUs. The inference phase can be done with CPUs. You can use both kind of chips for both stages, but for training on big datasets, a multi-GPU configuration is handy.
JP: I feel like we’re a shop that’s making the GPU a practical choice for the data warehouse. Adam did a great job with ND4J making the switch from CPU to GPU so simple and seamless that the user doesn’t have to make those chip decisions until later. With ND4J the chip decision then becomes a function of “does training faster make this a better business case?” and when the answer is “yes”, it creates an interesting opportunity for our GPU vendor friends.
Machine learning often requires a lot of behind the scenes “human-in-the-loop” work cleansing and annotating data, tuning parameters and validating results. What’s a typical workflow for look like for getting a DL-based solution to production?
AG: That’s right. You need to gather relevant data, make it accessible and make sure it’s clean enough to teach something to an algorithm. Deep neural nets can tolerate a lot of noise in very large data sets, so cleansing isn’t as important as it used to be on smaller sets. If you’re going to build a classifier, you need annotated data, and that’s something people solve using Mechanical Turk, or startups like CrowdAI. We actually put together an image of a typical data workflow using cats, which was kind of funny. Once you have the data, you move into the tuning and training phase with the neural network. And that’s iterative. You tune the hyperparameters and set the architecture, then see if the network learns. Lather, rinse, repeat. That’s why GPUs are useful in the training phase, because you want to iterate quickly and not sit around twiddling your thumbs. Finally, you test your model against data it’s never seen, and if it passes that test, you try it out in the real world.
JP: From a practical setting we knew ETL and vectorization were hard for most of the Fortune 500 machine learning teams. With that in mind, we both dedicated a chapter in the book, and also build a specific tool in the DL4J suite, DataVec, to handle these functions. DataVec allows us to create complex multi-dimensional vector and tensor input to DL4J from raw data. It also allows for many common ETL functions to be performed in the prep and cleaning phase of vectorization. DataVec can run on your local laptop or natively as a Spark application on a Hadoop cluster. All of our examples in our GitHub repo and in the book use DataVec as the vectorization tool of choice.
There’s a lot of hype in this space right now. How do you tap into that as a business without chasing your own tail or losing focus on what’s real?
AG: There is a lot of hype and noise from companies that aren’t really serious in this space. Every startup knows they can add 20% to their valuation by saying they do AI, when it’s really just logistic regression. I worry about solutions providers poisoning the well here by over-promising and under-delivering. Businesses that stumble with AI early on won’t come back for a while, and that hurts everybody. They should do their homework and get references. The advances we’re making in deep learning are real, and they will transform society and business in ways we can’t even anticipate in the next few years. The more people read, the more they’ll see that only a handful of companies and startups have made AI their mission. We’re a pure-play deep learning startup. We’ve been working on this for years and have tens of thousands of users. We focus on giving our users and customers a good experience, on making them succeed with support and a better product. We listen to them and solve the problems they face. Eventually, other people notice that we have solved the big problems that they are running into, such as distributed training with Spark, bringing hardware acceleration to the JVM, building Numpy and Cython for the JVM, and making production deployment easy. That’s where the rubber meets the road.
JP: I’ve lived through the waves of smart grid, cloud, big data, and now Deep Learning. These waves are much like the ocean tides in that they come in and everyone rides high on marketing themes, and then the tide goes out and a lot of things not well grounded get swept out. It’s the folks who keep their head and find solid footing in a high tide that can sustain themselves when the tide goes out.
At Skymind we’re customer- and partner-focused on the real problems they have today. We try very hard to not get pulled into anything that can’t potentially go to production in the next 12 months, or only serve in some press release. We may not always have a new network architecture variant that came out last month, but we will be the platform for enterprise deep learning that is the most secure, most interoperable, and easiest to use for the Fortune 500.
We’re actively looking for ways to help the Fortune 500 realize return on their investment in big data infrastructure (e.g. our early integrations with Hadoop and Spark). Everything from our proof-of-concept process, to our GitHub repos, to the book is focused with this mindset and direction.
When are you building me a sentient bot that writes this column for me?
AG: Didn’t I tell you I hired a bot to answer these questions? Modern technology is just magical.
JP: Well, our LSTM-based bot doesn’t write columns yet, but we do have one that writes beer reviews, the Lager Bot.