Friday, May 3, 2019

Word of the Day: Hadoop

Word of the Day WhatIs.com
Daily updates on the latest technology terms | May 3, 2019
Hadoop

Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications running in clustered systems. It is at the center of a growing ecosystem of big data technologies that are primarily used to support advanced analytics initiatives, including predictive analytics, data mining and machine learning applications. Hadoop can handle various forms of structured and unstructured data, giving users more flexibility for collecting, processing and analyzing data than relational databases and data warehouses provide.

Hadoop is primarily geared to analytics uses, and its ability to process and store different types of data makes it a particularly good fit for big data analytics applications. Big data environments typically involve not only large amounts of data, but also various kinds, from structured transaction data to semistructured and unstructured forms of information, such as internet clickstream records, web server and mobile application logs, social media posts, customer emails and sensor data from the internet of things (IoT).

Formally known as Apache Hadoop, the technology is developed as part of an open source project within the Apache Software Foundation (ASF). Commercial distributions of Hadoop are currently offered by four primary vendors of big data platforms: Amazon Web Services (AWS), Cloudera, Hortonworks and MapR Technologies. In addition, Google, Microsoft and other vendors offer cloud-based managed services that are built on top of Hadoop and related technologies.

Hadoop and big data

Hadoop runs on clusters of commodity servers and can scale up to support thousands of hardware nodes and massive amounts of data. It uses a namesake distributed file system that's designed to provide rapid data access across the nodes in a cluster, plus fault-tolerant capabilities so applications can continue to run if individual nodes fail. Consequently, Hadoop became a foundational data management platform for big data analytics uses after it emerged in the mid-2000s.

Hadoop was created by computer scientists Doug Cutting and Mike Cafarella, initially to support processing in the Nutch open source search engine and web crawler. After Google published technical papers detailing its Google File System (GFS) and MapReduce programming framework in 2003 and 2004, respectively, Cutting and Cafarella modified earlier technology plans and developed a Java-based MapReduce implementation and a file system modeled on Google's.

In early 2006, those elements were split off from Nutch and became a separate Apache subproject, which Cutting named Hadoop after his son's stuffed elephant. At the same time, Cutting was hired by internet services company Yahoo, which became the first production user of Hadoop later in 2006. (Cafarella, then a graduate student, went on to become a university professor.)

Use of the framework grew over the next few years, and three independent Hadoop vendors were founded: Cloudera in 2008, MapR a year later and Hortonworks as a Yahoo spinoff in 2011. In addition, AWS launched a Hadoop cloud service called Elastic MapReduce in 2009. That was all before Apache released Hadoop 1.0.0, which became available in December 2011 after a succession of 0.x releases. Continue reading...

Quote of the Day

 
"Hadoop architectures have taken things to a different level, opening up new types of data for analysis and making it more feasible -- technically and economically -- to collect, process and use all the information flowing into organizations." - Craig Stedman

Learning Center

 

Mining equipment maker uses BI on Hadoop to dig for data
Like much about big data, BI on Hadoop is still evolving. But rollouts beyond an elite inner cadre are now underway. In one case, a data pro is using Arcadia Data tools to decentralize a mining equipment company's data analytics.

Cloudera-Hortonworks merger narrows Hadoop users' options
The Cloudera-Hortonworks merger shrinks the number of independent commercial Hadoop vendors to two, but analysts say the combined company has a better chance against bigger cloud rivals -- and that more consolidation in the big data platforms market isn't necessarily a bad thing for users.

Hadoop cluster configuration best practices streamline workflows
Because each workload has different parameters, it's important to follow best practices for Hadoop cluster configuration with XML files, education on various functions and continuous testing.

Snowflake CEO Bob Muglia talks cloud data warehouse evolution
Bob Muglia, former CEO of cloud data warehouse vendor Snowflake, discussed Snowflake's place in the changing cloud data warehouse market and the growing importance of data ethics.

How data staging helped Walgreens transform its supply chain
Walgreens built a centralized data warehouse to provide supply chain partners with a better view of data -- but the analytics were still slow. That's when the company turned to Kyvos for help building a data staging tier.

Quiz Yourself

 
The use of a spreadsheet when a data warehouse was required created a situation _______ effective analysis was impossible.
a. where
b. in which

Answer

Stay in Touch

 
For feedback about any of our definitions or to suggest a new definition, please contact me at: mrouse@techtarget.com

Visit the Word of the Day Archives and catch up on what you've missed!

FOLLOW US

TwitterRSS
About This E-Newsletter
This e-newsletter is published by the TechTarget network. To unsubscribe from Whatis.com, click here. Please note, this will not affect any other subscriptions you have signed up for.
TechTarget

TechTarget, Whatis, 275 Grove Street, Newton, MA 02466. Contact: webmaster@techtarget.com

Copyright 2018 TechTarget. All rights reserved.

No comments: