With the strong growth over the past 10 to 15 years of Amazon Web Services, Microsoft Azure, Google and Yahoo, data center technologies have advanced rapidly from their enterprise roots as IT professionals work to serve the needs of these web services providers. The sheer scale required by just one of these providers dwarfs even the largest Enterprise infrastructures. Notably, as well, with the emergence of Cloud computing, many Enterprises operate hybrid infrastructures, with business units maintaining infrastructures in both their own Private and Public Clouds such as AWS or Azure. Technologies developed to suit the needs of one increasingly are being deployed in the other, especially with the proliferation of Big Data.
Surely the most whimsically named of these technologies is Hadoop, an Open Source software framework used to manage large clusters of servers. It is a collection of technologies developed over ten years ago by engineers at Yahoo, including Sean Suchter and Doug Cutting, the latter of whom named the framework after his son's toy elephant. The framework runs on clusters of servers running the Apache operating system and its development is managed by the Apache Software Foundation. To boot, a cheerful yellow elephant has been adopted as its logo. Like all Open Source software, the technologies are free to be adopted and developed by third parties within the foundation framework, hence the term "open". Since no license fee is collected, no revenue is generated for what is essentially a non-profit entity.
Yet software is typically a very profitable business. Once developed, software license sales have a low cost of sales and gross profit margins can be as high as 85%. Operating leverage is very high, moreover, and operating profitability scales rapidly with size. In fact, Oracle reports no Cost of Sales and its non-GAAP operating margin in the February quarter was a robust 32%. That said, Hadoop is Open Source software and licensed by the Foundation through its distributors, who only charge for subscription support and professional services. Other notable examples of Open Source Software are Java and Linux.We will return this below.
Serving Ever Larger Clusters, Hadoop Embraces An Ever-Widening Cloud
At its most basic level, Hadoop comprises the Hadoop Distributed File System (HDFS) and the Map Reduce framework used to store and retrieve data, which are distributed across the cluster to avoid any single point of failure. The technology is optimized for the use cases of web services providers such as Yahoo, Google and Facebook, where millions of users download a unique assembly of content as determined by their queries or previous visits. Hadoop is the central to their offerings.
A wide variety of uses and applications has arisen subsequently within this framework, many of them developed by individual companies, most notably Hortonworks (NASDAQ:HDP). MapReduce has been used for batch processing of data from early on in the technology's development. More recently, Hortonworks contributed YARN (i.e., Yet Another Resource Negotiator), which is used within the platform for resource management and job scheduling. This allows multiple applications to run concurrently by distributing them across the cluster.
Hadoop is surely a great technology, but is it also a good business? As companies increasingly adopt Cloud computing, Hadoop is expected to generate strong growth. Zion Market Research recently sized the global Hadoop market (including software, hardware and services) at $7.69 billion in 2016 and forecast it to grow to $87.14 billion in 2022, with a CAGR of 50%. A similar scale for the Hadoop market is suggested by Allied Market Research, which suggests it will generate software, hardware and services revenues of $84.6 billion in 2021. We have purchased neither report but assume the size of the software market per se is discussed within each of them.
As technology, Hadoop is broadly used across the computing infrastructure of web service providers. Big Data is proliferating as well in commercial uses. As it is increasingly adopted in Enterprise computing, its attractiveness as a business will become increasingly clear. Hadoop is far less costly than present comparable Enterprise technologies such as Data Warehousing. Surely it offers strong growth. Yet for some specific reasons, Hadoop is relatively less profitable than other types of software, mainly because so much of the technology is Open Source and freely available. There is no fee in its licensing, as we noted above. No fee revenue, less profit.
Five companies offer individual so-called distributions of Hadoop: Cloudera (OTC:CLDR), Hortonworks, MapR Technologies, IBM and Dell Technologies' Pivotal Software. Cloudera just went public and Hortonworks is publicly-traded as well. Both derive revenue from support subscriptions and the provision of service to Hadoop users that obtain the underlying software from the Apache Software Foundation, which maintains the code for both Hadoop and the Apache operating system, as well as their extensions.
Related & Handpicked articles you may be interested in, check them out