Skip to main content

2016 Open Source
Rookies of the Year

For nine years, the Black Duck Open Source Rookies of the Year awards have recognized some of the most innovative and influential open source projects launched during the previous year. This recognition is a tribute to the success and momentum of these projects, and affirmation of their prospects moving forward. 

This year, we saw organizations stretching for broader influence across use cases and to evolve the standards for performance and ability. These activities were largely focused among seven key areas.

Key Developments in Open Source Software:

  • Stretching the Blockchain: Many open source projects are exploring ways to extend blockchain technology to uses well beyond crypto currency. 
  • Diving into Deep Learning: Projects seeking to simplify machine learning to encourage broader adoption across industries and applications. 
  • Controlling Container Clutter: A number of this year’s open source projects have found remarkable opportunities to simplify the world of containers.
  • Beyond Basic Database Data: Open source projects are striving to accelerate data analysis, increase database efficiency, and blur the line between the traditional database and blockchain technologies.
  • Redefining Software-Defined Networks: Projects applying open source aimed at making networks as agile and flexible as the virtualized server and storage infrastructure of the modern data center.
  • Network Security: Several projects are striving to revolutionize network security by leveraging cutting edge machine learning and software-defined network capabilities.
  • Revolutionizing Education: Open source projects with the goal to make learning resources readily available to students and teachers worldwide.



Winner: Sawtooth Lake

Sawtooth Lake, Intel’s new distributed ledger platform for the Hyperledger blockchain, was developed to address concerns about the scalability and security of existing blockchain technologies. Predecessors to Sawtooth Lake, such as Bitcoin, are often viewed as unable to scale efficiently, with hardware expected to consume an equivalent amount of electricity to that of small countries within the next few years. Sawtooth Lake approaches these shortcomings with a modular solution and a tamper-resistant consensus algorithm for random leader election to claim new blocks based on a Proof of Elapsed Time; this streamlines compute activity, enhances security, and minimizes wasted electricity. Sawtooth Lake can satisfy many use cases by establishing domain-specific transaction families in which rules are pre-defined for a domain and the transactions occurring within it. 

Honorable Mention: Steem 

Steem is a blockchain database unlike any other, extending the traditional crypto currency model to support community interaction by rewarding engagement based on contributions made to the community itself. Steem is the underlying blockchain foundation upon which the social media platform SteemIt sits. Steem builds upon crypto currency tools developed over the last seven years by providing users with Steem Power (a measure of influence on stake-weighted voting) and Steem Dollars (an exchange-traded crypto currency). Partnering with former Open Source Rookie Rocket.Chat, for social communications, Steem is a prototypical community project in both its development and its aspirations; looking to the future, the Steem team see SteemIt as the vehicle to foster broad participation in crypto currency. 

Big Data 

Winner: CarbonData 

Apache’s CarbonData project is a fully indexed columnar and Hadoop-native data store with integration with Spark for query optimization. CarbonData’s unique approach to data organization, multi-level indexing, and optimization allows for faster data filtering, better compression, and enhanced search and query processing for more-efficient use of compute resources. Dictionary encoding allows aggregation to be accomplished faster, with “deferred decoding” occurring after aggregation. CarbonData’s versatility is evidenced by its single file format for distinct data access patterns, like OLAP Query (multi-dimensional analysis), Sequential Access (big scan), and Random Access (narrow scan).  

Honorable Mention: Apache Arrow 

Apache’s Arrow project accelerates the processing of large quantities of data, and hopes to become the de-facto standard for columnar in-memory processing and interchange. Apache is optimistic of this goal, having fast-tracked this project past the traditionally prerequisite Incubator phase. Historically, distributed computing required significant compute resource dedication to translate data representations from one form to another. Apache Arrow removes this burden. Arrow enables the CPU to run faster when processing data and to begin the next process before the previous has finished, yielding one to two orders of magnitude increase in speed. 

Honorable Mention: BigChain DB 

BigchainDB sits squarely between a familiar database architecture and cutting-edge blockchain technologies. Conceptualized by Ascribe as a means to protect digital art ownership on an immutable platform, BigchainDB seeks to address existing constraints with blockchain; scalability, speed, and cost. Ascribe partnered with MongoDB to “blockchainify” databases and set out in June 2016 to extend beyond the single use case of digital art and into arena including HR, Medical, FinTech, Land Registry, and more – now sporting a portfolio of nearly a dozen partners. Functioning as a distributed database with blockchain characteristics, BigchainDB provides distinct value over either standalone technology as a rapid, low-cost, scalable, and query-able solution with high throughput, sub-second latency, and immutability.  

Deep Learning 

Winner: Deep Scalable Sparse Tensor Network Engine (DSSTNE) 

Amazon’s DSSTNE – boldly pronounced “Destiny” – seeks to evolve the neural networks landscape by optimizing for data sparseness and scalability and focusing on optimal use of multiple GPUs. Inspired by the large scale and low latency computing needs of Amazon’s product recommendations feature, DSSTNE is designed to support large networks, sparse datasets, and parallel training. Amazon’s DSSTNE has documented speeds between 2x and 15x that of Google’s TensorFlow, when handling large sparse datasets. Yet DSSTNE isn’t interested in being everything to everyone; their focus on scale and sparse data keeps them keenly focused on establishing the most powerful deep learning framework and recommendation engine for e-commerce and enterprise. 

Honorable Mention: PaddlePaddle 

Baidu makes its contribution to the evolution of deep learning with PaddlePaddle (PArallel Distributed Deep Learning), a deep learning platform which emphasizes simplicity and efficiency. PaddlePaddle makes it easier for developers unfamiliar with the space to leverage deep learning tools without extensive experience. This extends the solution’s potential influence to an array of industries, use cases, and organizations which may have otherwise lacked the ability to leverage other platforms. The PaddlePaddle team state that this project is poised to be the first open source solution to achieve high tolerance. It is possible to run many concurrent neural solutions, shifting priority tasks as they come in without “killing” other processes to free resources. Baidu attributes significant credit for the success of PaddlePaddle to Kubernetes, recognizing the container orchestration tool’s efficacy and momentum in the marketplace as a means to future-proof the PaddlePaddle platform. 

Honorable Mention: Magenta 

When considering the implications of recurring neural networks, few have considered their role in more creative media; Google’s Magenta project bridges the gap between technology and art. The Google Brain team, who previously developed the Deep Dream neural network project, began Magenta as a research initiative to use machine learning to understand creative content, like speech and music. Pulling together a team of artists, coders, and machine learning researchers, the Magenta group developed learning algorithms which explore distinct and common characteristics of music and art with the goal of using the resource to create artistic content of its own. For artists, this means Magenta has the potential to be a resource in the toolkit for creating music from existing digital works. For those without musical expertise, creativity can be as close as a few keystrokes away.  

Software-Defined Networking 

Winner: OpenCord 

Open Networking Lab (ON.Lab – a non-profit organization focusing on realizing the full potential of software-defined networks) is providing an end-to-end solution which combines SDN, NFV, and Cloud with commodity infrastructure to bring datacenter-grade scale and agility to service provider networks with the OpenCORD (Central Office RE-architected as a Datacenter) project.  

CORD spans the telco central office, access, home, and enterprise, using common infrastructure with open building blocks to reduce CapEx/OpEx and accelerate time-to-market with programmable, flexible networks. CORD is backed by providers like AT&T, South Korea Telecom, China Unicom, and NTT, and has joined forces with the Open Networking Foundation to drive network solutions with open source software and software-defined standards. Much of CORD’s success is driven by its applicability across different scenarios including Residential, Metro, and Mobile Telecom.  

Honorable Mention: OPEN-O 

Network functions virtualization (NFV) management and organization (MANO) is an ETSI-defined framework for simplifying and standardizing the management/orchestration of resources within cloud datacenters, including compute, storage, virtualization, and networking. OPEN-Orchestration (OPEN-O) is the Linux Foundation’s framework for end-to-end service orchestration across any network; not just SDN, but legacy networks and NFV infrastructure as well. Therefore, OPEN-O strives to address the NFV MANO challenge as well as the connectivity challenge, allowing service operators to use a model-driven approach to easily define services in a vendor-neutral way. This helps to avoid lock-in, to reduce development/integration/operations costs, and to accelerate time to market. To ensure commonality and flexibility, the OPEN-O architecture complies with ETSI NFV MANO as well as service modeling languages like YANG and TOSCA. 

Network Security 

Winner: Poseidon 

This year, two organizations are striving to revolutionize network security by leveraging cutting edge machine learning and software-defined network capabilities. Backed by technology accelerator In-Q-Tel (IQT), and working closely with the US government, academia, and the U.S. Intelligence Community, challenge labs Lab41 and CyberReboot have launched the Poseidon project.  

Poseidon seeks to answer two key questions: What is on your network, and what is it doing? It answers these questions by providing situational awareness to the items being added or removed from your network, as well as the traffic being generated. Poseidon leverages machine learning techniques, examining the interactions on the network and learning distinct cues of disallowed or malicious activity. In some preliminary tests, Poseidon caught 84% of malicious activity with a very promising 2.2% false positive rate. For the teams at Lab41 and CyberReboot, the goal is to increase effort and complexity for “hackers” while decreasing effort and complexity for security teams – without compromising on security. 

Winner: Trireme 

With software and data disaggregating into cloud environments, it is becoming increasingly difficult to implement effective security practices. Aporeto has explored the software-defined network and the convergence of security policies, scale, and remediation efficiency, and released Trireme, a cloud-native security solution for distributed applications. Trireme allows the creation of security policies at-scale and application segmentation through end-to-end authentication and authorization – a digital handshake. Aporeto has chosen to work closely with Kubernetes because of its flexible network policy framework, rapidly growing community, and close attention to cloud-native applications and scale. For these two groups, Trireme is their answer to the question: How do you harden your Linux environment and ensure that containers can run efficiently, securely, and accomplish what they need to? 


Winner: Ansible Container 

Originally developed under the working title of “Harbor Master,” Ansible Container is the result of the Ansible development team’s desire for an alternative to Docker files. Ansible Container works to automate the container build, deployment, and management process using nothing but Ansible Playbooks. Ansible Containers is platform-agnostic, able to target the most common container orchestration engines including Kubernetes, Docker, and OpenShift. This modularity allows you to target Docker during development and Kubernetes during deployment, for example, with a simple configuration change. Since its reveal at DockerCon in Summer 2016, this project has seen marked community and technological growth. 

Honorable Mention: SwarmKit 

Docker’s SwarmKit is an open source toolkit used to build multi-node systems and perform orchestration at-scale. An intriguing distinction for SwarmKit is its ability to function in single-node scenarios. This can ease the transition from single-node development and testing to cluster deployments and distributing tasks to machines in the cluster. SwarmKit uses the Raft Consensus Algorithm for leader selection and coordination, removing the risk of a single point of failure for decisions. It is also notably secure, automating certificate issuance and rotation while using mutual TLS for node authentication, role authorization, and transport encryption. Interested in learning about Docker container security? Learn more about Black Duck Software's Docker Container security solutions.  


Winner: Kolibri 

Amid the flurry of commercial and open source giant projects, Learning Equality has set out to revolutionize education for low-resource communities around the globe with its Kolibri application. While not yet available for public download, Kolibri seeks to make learning resources available to students and teachers in areas with limited education resources, from rural schools and after-school programs to refugee camps and orphanages. Kolibri provides interactive exercises, self-paced resources, and collaborative learning tools, with real-time feedback and guidance for both students and teachers. 

To efficiently disseminate resources, Kolibri “seeds” endpoint devices with installers, updates, and content via an internet connection. Seeded devices share new content and updates with other devices via an offline local network. Kolibri’s technologies compresses content with minimal loss in quality, allowing large amounts of content to reside on small, low-cost devices.