Goals & Objectives:
By the end of the course, you will:
- Get a clear understanding of Apache Hadoop, HDFS, Hadoop Cluster and Hadoop Administration
- Gain insight on Hadoop 2.0, Name Node High Availability, HDFS Federation, YARN, MapReduce v2
- Plan and Deploy a Hadoop Cluster
- Load Data and Run Applications
- Configuration and Performance Tuning
- Manage, Maintain, Monitor and Troubleshoot a Hadoop Cluster
- Secure a deployment and understand Backup and Recovery
- Understand about Oozie, Hcatalog/Hive, and HBase Administration
Hadoop Cluster Administration
Learning Objectives – In this module, you will understand what is Big Data and Apache Hadoop, How Hadoop solves the Big Data problems, Hadoop Cluster Architecture, Introduction to MapReduce framework, Hadoop Data Loading techniques, and Role of a Hadoop Cluster Administrator.
Topics – Introduction to Big Data, Hadoop Architecture, MapReduce Framework, A typical Hadoop Cluster, Data Loading into HDFS, Hadoop Cluster Administrator: Roles and Responsibilities
Hadoop Architecture and Cluster setup
Learning Objectives – After this module, you will understand Multiple Hadoop Server roles such as NameNode and DataNode, and MapReduce data processing. You will also understand the Hadoop 1.0 Cluster setup and configuration, Setting up Hadoop Clients using Hadoop 1.0, and important Hadoop configuration files and parameters.
Topics – Hadoop server roles and their usage, Rack Awareness, Anatomy of Write and Read, Replication Pipeline, Data Processing, Hadoop Installation and Initial Configuration, Deploying Hadoop in pseudo-distributed mode, deploying a multi-node Hadoop cluster, Installing Hadoop Clients
Hadoop Cluster: Planning and Managing
Learning Objectives – In this module, you will understand Planning and Managing a Hadoop Cluster, Hadoop Cluster Monitoring and Troubleshooting, Analyzing logs, and Auditing. You will also understand Scheduling and Executing MapReduce Jobs, and different Schedulers.
Topics – Planning the Hadoop Cluster, Cluster Size, Hardware and Software considerations, Managing and Scheduling Jobs, types of schedulers in Hadoop, Configuring the schedulers and run MapReduce jobs, Cluster Monitoring and Troubleshooting.
Backup, Recovery and Maintenance
Learning Objectives – In this module, you will understand day to day Cluster Administration tasks such as adding and Removing Data Nodes, NameNode recovery, configuring Backup and Recovery in Hadoop, Diagnosing the Node Failures in the Cluster, Hadoop Upgrade etc.
Topics – Configure Rack awareness, Setting up Hadoop Backup, whitelist and blacklist data nodes in a cluster, setup quota’s, upgrade Hadoop cluster, copy data across clusters using distcp, Diagnostics and Recovery, Cluster Maintenance.
Hadoop 2.0 and High Availability
Learning Objectives – In this module, you will understand Secondary NameNode setup and check pointing, Hadoop 2.0 New Features, HDFS High Availability, YARN framework, MRv2, and Hadoop 2.0 Cluster setup in pseudo- distributed and distributed mode.
Topics – Configuring Secondary NameNode, Hadoop 2.0, YARN framework, MRv2, Hadoop 2.0 Cluster setup, Deploying Hadoop 2.0 in pseudo-distributed mode, deploying a multi-node Hadoop 2.0 cluster.
Advanced Topics: QJM, HDFS Federation and Security
Learning Objectives – In this module, you will understand basics of Hadoop security, Managing security with Kerberos, HDFS Federation setup and Log Management. You will also understand HDFS High Availability using Quorum Journal Manager (QJM).
Topics – Configuring HDFS Federation, Basics of Hadoop Platform Security, Securing the Platform, Configuring Kerberos.
Oozie, Hcatalog/Hive and HBase Administration
Learning Objectives – In this module, you will understand Setting up Apache Oozie Workflow Scheduler for Hadoop Jobs, Hcatalog/Hive Administration, deploying HBase with other Hadoop components, Using HBase effectively to load data, writing to and reading from HBase.
Topics – Oozie, Hcatalog/Hive Administration, HBase Architecture, HBase setup, HBase and Hive Integration, HBase performance optimization.
Project: Hadoop Implementation
Learning Objectives – In this module, you will understand how multiple Hadoop ecosystem components work together in a Hadoop implementation to solve Big Data problems. You will also learn how to plan, design, and deploy a Hadoop Cluster using a typical Real-World Use Case.
Topics – Understanding the Problem, Plan, Design, and Create a Hadoop Cluster for a Real World Use Case, Setup and Configure commonly used Hadoop ecosystem components such as Pig and Hive, Configure Ganglia on the Hadoop cluster and troubleshoot the common Cluster Problems.