There are so many Top Rated Companies Using Apache Hadoop framework to deal with their large amount of data that is increasing continuously every minute.
Being an open source framework, it is highly adopted by several organizations to store and process a large amount of structured and unstructured data by employing MapReduce programming model.
If we talk about the biggest Hadoop cluster, Yahoo! Is the first name in the list having around 4500 nodes in its cluster. It is followed by Facebook and LinkedIn being in the second and third position.
If you are looking to build your career in field of big data Hadoop, then give a start with learning big data hadoop. Join our Hadoop Training program and start career as a big data hadoop professional to solve large data problems.
Top Companies Using Apache Hadoop Technology
We have given below some of the world’s most popular and top rated organizations that are using Apache Hadoop for their research and production.
[1] A9.com
A9 is a subsidiary of Amazon whose core area is to build technologies related to search engines and search advertising.
It has around 1-100 nodes in its Hadoop cluster.
[2] Adobe
Adobe is a computer software company and is among world’s top companies using Apache Hadoop and Apache HBase for their data storage and several other social services.
It has 30 nodes in its cluster and planning to deploy a new venture on 80 nodes cluster.
[3] Alibaba
Alibaba is a Chinese e-commerce company that provides various sales services and other services such as electronic payment, shopping search engine, and cloud computing.
It has 15 nodes each having 8 cores, 1.4 T storage, and 16 GB RAM.
[4] AOL
AOL(America Online) is basically a web portal which provides online services. It uses Apache Hadoop for carrying out behavioral analysis and targeting which includes ETL style processing, running advanced algorithms etc.
It uses around 150 machines in its cluster, each having 800 GB storage and 16 GB RAM.
[5] ARA.COM.TR
It is a Turkish Search Engine which uses Apache Hadoop for analytics purpose. The nodes in its cluster range from 10-100.
[6] Acknowledge
Adknowledge is a Digital Marketing company headquartered in the United States specialized in Digital Video Advertising, Social Media Marketing, Ad Network etc.
It uses Apache Hadoop for behavioral targeting and clickstream analytics. Its cluster has 50-200 nodes.
[7] Cornell University Web Lab
It is an American private doctoral university based in Ithaca, New York.
They are using a 100 node cluster having 72 GB storage and 2 GB RAM to generate web graphs.
[8] CRS4
CRS4 is an Italian research and development institute that works on how a large amount of data that flows from biomedical labs and high throughput experiments can be effectively used.
It has 400 node cluster in which each node has two 250 GB hard disks and 16 GB RAM.
[9] eBay
It is a multinational e-commerce company which provides online shopping facilities.
eBay use MapReduce, Apache Hive, Apache Pig, Apache HBase for Search Optimization and other related researchers.
It has 532 nodes cluster having 4256 cores and 5.3 Petabyte storage.
[10] eCircle
eCircle is a Germany-based company which provides email marketing and digital marketing solutions.
It has a total of 120 nodes divided into two 60 nodes cluster. Both the cluster have 1000 cores, 5TB RAM and 1 PB storage.
[11] Facebook
It is the world’s most popular social networking service provider company.
It has two main clusters. One is 1100 node cluster having 8800 cores and 12 Petabyte storage. Another one is 300 node cluster having 2400 cores and 3 Petabyte storage.
[12] Fox Audience Network
It is a marketing and advertising company located in California, United States. It has been acquired by another online advertising company Rubicon Project.
It has three clusters:
- 40 nodes cluster having 320 cores and 2TB Harddisk.
- 70 nodes cluster having 540 cores and 3TB Harddisk.
- 30 nodes cluster having 240 cores and 4TB Harddisk.
Hence, it has a total of 140 nodes in its cluster.
[13] InMobi
InMobi is a private mobile advertising company founded in 2007 by Naveen Tewari, an Indian entrepreneur along with his three partners Mohit Saxena, Amit Gupta, and Abhay Singhal. It is headquartered in Singapore.
It is using Apache Hadoop for Data Science, Machine Learning, ETL and Analytics in its six Data Centers.
It has 700 node cluster with 16800 cores and more than 5 Petabyte storage.
[14] LinkedIn
It is one of the most popular organization in the world which provides social networking services regarding business and employment.
It is using number of software that includes Apache Hadoop, Apache Hive, Apache Avro, Apache Kafka, Azkaban(a batch workflow job scheduler), Apache Pig, RHEL(Red Hat Enterprise Linux), Apache DataFu, and Sun’s JDK.
It uses the following hardware’s:
- 800 Westmere-based HP SL 170x having 24GB RAM and six 2TB hard disks.
- 1900 Westmere-based SuperMicro X8DTT-H having 24GB RAM and six 2TB harddisks.
- 1400 Sandy Bridge-based SuperMicro having 32GB RAM and six 2TB harddisks.
[15] Last.fm
It is the world’s largest online music website founded in 2002 in U.K.
It has a 100 node cluster with 24GB RAM and 8TB storage. It uses Apache Hadoop for log analysis, A/B testing, charts calculation, dataset merging, audio feature analysis and royalty reporting.
[16] NetSeer
NetSeer is a U.S-based company that provides concept-based intent ad targeting solutions.
Reportedly, it is using a 1050 node cluster for processing, crawling, log analysis etc.
[17] Powerset/ Microsoft
It is an American company that focuses on developing a natural language search engine to find the answers entered by the users.
It has a 400 node cluster and is using Apache HBase.
[18] Spotify
Spotify is a Sweden-based private company that provides music and video streaming services.
It uses Apache Hadoop for reporting, analysis, generating content and music recommendations, and data aggregation.
It has 1650 node cluster with 43,000 cores, 70 TB RAM and 65 PB storage.
[19] Twitter
Twitter is one of the most popular social network company based in California, U.S. It provides online news and social networking services in the form of tweets(twitter messages).
It is among the top-rated companies using Apache Hadoop for storing and processing tweets and several other data. It also uses Apache Pig for ad hoc and scheduled jobs.
It has not yet declared the details of big cluster they are using.
[20] Yahoo
Yahoo! is an internet services company and is a subsidiary of Verizon Communications. It provides web services through its web portal, Yahoo! search engine, Yahoo! Mail, Yahoo! Directory etc.
It is one of the top companies using Apache Hadoop and Pig on more than 40,000 computers for Ad systems, web search and scaling tests.
It has the world’s biggest cluster having 4500 nodes, 16GB RAM and four 1TB storage.
We have provided you a list of top twenty companies using Hadoop technology. There are several other companies who are employing Hadoop to store and process a large amount of data. All the above information clearly depicts the heavy usage of Apache Hadoop in world’s biggest organizations.