Deeplearning4j is an open source, distributed neural net library written for Java and Scala. It is also one of the most active communities on Gitter, the chat service I created. Interested in how they built a thriving open source community, I reached out to get their thoughts on the lessons they learned. Maybe you’ll find some tips for your project in my interview with Adam Gibson and Chris Nicholson, two of the founders. (Josh Patterson is also a founder.)
Tell us about a little bit about yourself and the Deeplearning4j community. How did it all begin?
We started building Deeplearning4j in late 2013. Adam had been involved with machine learning for about four years at that time, and deep artificial neural networks were looking more and more promising. The first network in Deeplearning4j was a restricted Boltzmann machine, because that was the net that Geoff Hinton had come up with back in 2006, which was the turning point in the field. I was working for another startup doing PR and recruiting, and had previously worked as a journalist, so I took care of the documentation (and still do).
We believe proper communication is key to making open source code valuable.
What digital tools do you use to help manage and grow your community?
The code lives on GitHub and the conversation lives on Gitter. There are about 1,360 devs on the Gitter channel now, so it’s probably one of the more lively neural net conversations on the planet. Our website is hosted on GitHub, so the content lives there too.
We generate a lot of automatic documentation with Javadoc (always a WIP). We ask people to use Maven as their automated build tool. One of the biggest problems with any software is the install, and Maven helps make that a little easier. You need to constantly try to clear away obstacles so that people can just use your code and not worry about other stuff.
What are most discussions about in the Deeplearning4j channel on Gitter?
The main issues discussed used to be about installation. Engineers in the community taught us a lot about how to write clearer instructions and how to make the code and experience better. If we hadn’t had that feedback loop, Deeplearning4j wouldn’t be what it is today. Open source communities are amazing for quality control. The sooner you fix an issue, the fewer demands you get from the community about that issue. It’s a great incentive to move quickly.
Now the main issue is loading data and neural net tuning. We are working on communicating better about that and making the framework better so that ETL and tuning get easier. There are also a lot of basic questions about machine and deep learning. Many software engineers have figured out that deep learning and machine learning are really powerful tools, so they’re trying to grasp new ideas. We’ve written a lot of introductory material and point them to various web pages where those ideas are explained.
What common goals do you and your community have?
The community is centered around Deeplearning4j and our scientific computing library, ND4J, which powers the neural nets. So we answer questions about how to use the libs, and in the process we help people understand more about deep learning in general. It’s not a deep learning hotline, unfortunately, so there are some questions we don’t tend to answer. But we do help engineers in the DL4J community build apps and understand how neural nets work.
The common goal is to learn about deep learning, and to build cool stuff. We’ve only seen the tip of the iceberg in terms of what deep learning can do. So far, we’ve seen huge advances in image recognition, machine translation, machine transcription, and time series predictions. By many metrics, machine perception now equals or surpasses human perception, and that will change society in ways that are hard to imagine. Those changes just haven’t been implemented yet. So the secondary goal of the community is to bring this narrow form of AI into the world so that it can make a difference.
What factors contribute to the success of your community?
Creating and maintaining a community is a huge commitment of time and effort. You have to be available, and you have to try to understand where other people are coming from. They don’t always know the jargon well enough to ask precise questions, so you have to have the patience to figure out together with them what they’re trying to ask, or where they’re stuck. We’re not always as patient as we should be. Being available, making that effort, and offering support for powerful tools like this are a good way to build a community.
When the makers of a big project are available to answer esoteric questions about how it works, that creates a lot of trust, because people know that you speak with authority and that if something is really broken, it’s going to get fixed. There’s a tight feedback loop between the community and the project creators.
What are some of the challenges that you encounter while managing the community?
One of the challenges is: What questions do we care about, and what questions do people need to answer for themselves? If someone has really basic questions about Java, an IDE like IntelliJ, or a build tool like Maven, most of the time they need to figure that out for themselves. Our Gitter channel isn’t the right place to hash through that, although we do help in special cases, because sometimes you need to expand your heap space for neural nets to work.
You also have to find a balance between building the community and building the product. Ideally, you’d have a big team with full-time support engineers and the rest of the team working on the code base. But most open source projects have very small teams. There are just a handful of people capable of support, and they’re the ones who also should be fixing bugs and adding features.
How do you encourage commitment and contribution to the community?
You create a smart, friendly environment in the community. You remind them you appreciate contributions, and you show them, as best you can, what needs to be worked on. We created top-level files recognizing our contributors, showing people how to contribute, and laying down the rules of the community. We also wrote a devguide, and we now label all issues as bug, enhancement, or documentation, so that people can scan the queue quickly and explore where they can add something.
Tell us a bit about the time commitment required to set up and establish the community. How much community maintenance is required on an ongoing basis?
Skymind is a distributed team, with engineers in Australia, Europe, and the US, and Deeplearning4j community members in almost every time zone. There’s a Skymind engineer watching the Gitter queue probably 12–16 hours out of any weekday. This is a pretty serious commitment, because there are less than 10 of us. It’s not their full-time job, but maybe they’ll be running unit tests and answering questions on Gitter in their downtime.
Based on your experience, do you feel that the open source communities have changed and evolved over the past few years?
Open source is winning the enterprise stack, so it’s a lot more important than it used to be. The biggest organizations in the world are running on open source software. Linux won the operating system, Hadoop won big data storage, and open source won because when you do it right, you get better code. More eyeballs mean more uptime. So the size of the open source software community and the quality of attention that software engineers bring to open source projects have both increased over the years.
What advice would you give to someone who wants to start an online open source community from scratch?
First, build something neat. Something you care about. Focus on building one thing that works. Then, share it with people. They will help you improve it, and they may help you think about what to build next. Don’t do too much big up-front development. Try to scope it so that you can ship in a reasonable amount of time. A few weeks, say. Open source is valuable because it’s a conversation, and the conversation leads you places so that you and project evolve in ways you can’t anticipate. Also, by open sourcing early, you’re increasing your exposure and therefore your chances of getting help. We’ve had amazing developers join the community and the Skymind team.
Can you share a community member success story that happened thanks to their participation in your channel?
For most of the stories, you just had to be there. But in general, a lot of data scientists and Java engineers come and they just build something for their companies that works. They’ll come back later and say: “We saw a 200% increase in ad coverage when we made DL4J part of the recommender system.”
Another guy built an app with DL4J and then an investor saw it and he raised funds. So that’s all pretty cool. With open source, you’re throwing a rock out into the ocean, and you don’t always hear it hit the water. You can’t even see the ripples. So it’s encouraging when people come back and say “thanks” and tell us how it helped them. That makes it more meaningful.