Apache Hadoop Developer’s Track – 1 Day Course

The Apache Hadoop Developer's - 1 Day Course deeply explores Hadoop Architecture, Ecosystem and Implementation considerations. This track stresses the combined Hadoop literacy concepts as learned through core concept instructions, hands-on lab work and detailed class discussions. The only way to engage industry demands is with strong fundamentals, and the Apache Hadoop Developer's track provides that crucial intellectual infrastructure. The core concepts taught in this track include:
  • The importance of Hadoop in today's world
  • The technological fuels of MapReduce, Hive & Pig
  • Overview of MapReduce, Hive & Pig as an almost-universal template for Big Data analytics
  • Appropriate, and non-appropriate, application environments for these programming paradigms
The instructors review Hadoop's essential server components and detail its relation to MapReduce. Hive & Pig programming. Learners discover the integrated process with hands-on lanbs. Hadoop fine tuning parameters, definitive guidelines for distributed cluster setup, MapReduce, Hive & Pig programming and real-time monitoring, are colvered in this two day track. Distributed multinode Hadoop clusters are provided by ClustersTogo.com. The Hadoop clusters can be accessed & worked on by simply using any browser, without the need to download, install & setup anything. The Apache Hadoop Developer's track is the ultimate introductory experience for future industry players. It's thorough, accessible and industry-driven. Students can make a fun and challenging weekend of it and emerge with the empowerment of foundational knowledge. The Apache Hadoop Developer's track is a core component of the "Technology Series" of classes.

Lab Work

The Apache Hadoop Developer's Track is distinct from any other such industry class, because lab work comprises over 60 percent of the total course work. This track s places learners in the "hands-on" industry hot seat within a safe, nurturing class environment. The students are guided by Big Data gurus who have years of industry experience & can answer detailed technical questions. Students have the dual benefits of working on their own Hadoop clusters as part of a team setting during the class while also being able to work on the Hadoop clusters at their own time and leisure. Students can excel individually, or in groups as best they prefer. These clusters are provided by ClustersTogo.com. Lab training utilizes Click Stream and Twitter data in the MapReduce, Hive & Pig labs.

Prerequisites

Basic Unix skill, Basic Java programming & SQL knowledge.

Course Duration

1 Full Day

Class Date & Time

May 25th - 8:30 am to 5:30 pm

Course Location

  • Onsite : 3200 Coronado Dr, Santa Clara, CA 95054
  • Online : Access Information would be sent prior to the class

Audience

Developers, Business Analysts, Managers, Administrators

Agenda

A Day Prior to the Class – 6 to 7pm PST
  • Meet and Greet students – Online
  • Hadoop Setup – Support
  • Clusterstogo.com - Overview
Day 1 8:30 - 10:30 am
  • Hadoop Intro and Architecture
  • Hadoop Ecosystem
  • Reporting & ETL with Hive & Pig
  • Map Reduce Programming & Performance Monitoring
  • HBase - Random Access vs. Hadoop's Batch Processing
  • HDFS File structure – read/write data flow
10:30 - 10:45 am - break 10:45 – 12:45 pm
  • Hadoop Administration – High Level Overview
  • Demo Hadoop Administration – Features -    Fsck, TestDFSIO, benchmarking, configuration files, etc.
12:45 – 1:30 pm - Lunch break 1:30 - 3:00 pm
  • Lab  - Map Reduce Programming Intro
  • Lab – RDBM to Hadoop using Sqoop
  • Lab – Using Flume
3:00 – 3:15 pm - break 3:15 - 5:30 pm
  • Hive - Concepts and Reporting
  • Hive Lab - DDLs, DMLs, data types, join
  • Pig Lab - Concepts
  • Pig Lab – Pig Latin and ETL
A Day after the Class: 6 to 7pm PST
  • Follow Up Questions and Support

Recommended Readings

  1. Hadoop
  2. Hadoop Definitive Guide – by Tom White
  3. Programming Pig – by Alan Gates

Registration

Apache Hadoop Developer’s Track – 1 Day Course

Apache Hadoop: Architecture & Ecosystem – Jump Start

This course is designed to provide a basic understanding of the Hadoop architecture and its ecosystem. This is the first step for anyone aspiring to be a Big Data professional or just understand the Hadoop ecosystem. This course helps the students to transition from current-world RDBM based structured data management to the file based and unstructured database world of Hadoop. The course gets into the specifics of key parts of the Hadoop architecture, Hive, Pig and Map Reduce Programming with real-life use cases.  

Prerequisites


None! (Just your keen interest will do!)

Audience


Business & Management Personnel,  Young Software Developers

Recommended Readings


- O'Reilly's ‘Hadoop’ book by Tom White

Class Date


May 11th 2013

Class Duration


4 hours

Class Location


3200 Coronado Dr, Santa Clara, CA 95054

Registration


Option 1: Pay using Paypal at training@thirdeyecss.com 24 hours before the class. Option 2: Send us a check payable to "Third Eye CSS" at the mailing Address : 3200 Coronado Dr, Santa Clara, CA 95054. Check must be received 24 hours before the class start time. Option 3:

Contact Information:


For any additional information, please email at jeetadas@thirdeyecss.com Call or text at (408) 306-8462
Apache Hadoop: Architecture & Ecosystem – Jump Start

Map Reduce Programming – Deep Dive

An exhaustive class which covers in-depth of all MapReduce concepts. Students will learn:
  • Parallel processing, functional programming as the foundation for Hadoop.
  • How map and reduce work.
  • How map and reduce collaborate through shuffle.
  • HDFS fundamentals. Input, output formats.
  • Simple examples of Map Reduce with Java & Map Reduce with Streaming.
  • Anatomy of a Hadoop job: Job Submission & Execution.
  • Compression, serialization.
  • Configuration and tuning.
  • Multiple map reduce jobs and Hadoop workflow.
  • Monitoring and error handling.
  • Deal with complex Map Reduce examples.
 

Lab Work


Hands on lab exercises working with Big Data sets on a Hadoop cluster running on Amazon EC2.

Prerequisites


Basic Linux command line skills and server-side Java experience

Audience


Developers, Data Analytics professionals, Business Analysts, Managers

Recommended Readings


- O'Reilly's ‘Hadoop’ book by Tom White - Hadoop tutorial on YDN

Class Date


May 26th 2013

Class Duration


4 hours Class

Class Location


3200 Coronado Drive, Santa Clara,CA 95054

Registration


Option 1: Pay using Paypal at training@thirdeyecss.com 24 hours before the class. Option 2: Send us a check payable to "Third Eye CSS" at the mailing Address : 3200 Coronado Dr, Santa Clara, CA 95054. Check must be received 24 hours before the class start time. Option 3:

Contact Information:


Training Department training@thirdeyecss.com (408) 290-9949 – Ext 3
Map Reduce Programming – Deep Dive

Apache Hive & Pig – BI Developer

BI Developers need to access, transform & load data sets. For performing these activities over Big Data sets,  in a Hadoop environment, Hive and Pig are extremely handy skills to have. In this one (1) day course , we will learn in-depth about Hive and Pig's  architecture & design and development framework including installation steps and performance tuning of Map Reduce Programs covering SessionLog Data and other business subject areas. We will also learn implementation of various analytics and ETL processes using Hive & Pig for Big Data.We will also go over the ecosystem of Hadoop data management tools & framework. The class includes hands on labs where students work on actual Hadoop clusters & write Hive & Pig code to be ready for a Hive and Pig developer's role.

Lab Work

  • Hive Labs
  • PIG Labs

Prerequisites

Basic Linux command line skills. HDFS file system handling knowledge will help.

Audience

Developers, IT Administrators, Business Analysts, Data Scientist.

Course Duration:

A Whole Day Class: Duration 9:30am till 4:30pm, Saturdays or request a date.

Class Date & Time

March 9th 2013 - 9:00 am to 6:00 pm

Location:

Third Eye CSS's Training Center I 3200 Coronado Dr Santa Clara, CA 95054

Registration:

Contact Information:

For any additional information, please Contact Us. Or contact Jeeta at 408 306 8462 or email at jeetadas@thirdeyecss.com.
Apache Hive & Pig – BI Developer

Pentaho Big Data BI Developer – ETL & Report

This class has been especially created for non-programmers in mind. The typical audience for this class would be BI developers who have been using front end tools like Business Objects, Cognos, Informatica etc. This is a class with hands-on labs that will give its students a very good head start to work with the Hadoop ecosystem without actually having to code in Map-Reduce.
  • Execute Hadoop map/reduce with zero code
  • Intuitive, visual drag and drop designer
  • Widgets to perform a wide range of operations with Hadoop, HBase and Hive
  • Design complex transformations that integrate Hadoop with other external systems like databases
  • Ideal for non-programmers who still have to work with Hadoop
 

Labs


This class is heavily focused on lab work with a lot of exercises that involve orchestration of processes built in Pentaho for ETL and Reporting Exercises.

Audience


BI developers who have been using front end tools like Business Objects, Cognos etc. and now would like to do so in the Big Data world.

Prerequisites


Knowledge of BI tools & report development

Course Duration


A Whole Day Class: 9am till 5pm, Saturdays Our Next class is on the
    • 28th of April
 

Location


2900 Gordon Ave, Suite 100-20 Santa Clara, CA 95051

Registration


To Register please contact us at info@bdcuniversity.com.  

Contact Information:


info@bdcuniversity.com 408-256-3282
Pentaho Big Data BI Developer – ETL & Report

Hive Administration & HiveQL Analytics – Deep Dive

For a file based system like Hadoop, developers need a mechanism to query data using a SQL like language. This is where Hive comes in. This class covers all major areas of Hive with extensive labs. This class empowers the students with the necessary knowledge to effectively function as Hive Developer in the Big Data marketplace. This class covers the following areas:
  • Installation
  • Architecture
  • Metastores
  • Data Modeling
  • UDF
  • HiveQL - Basics & Advanced Concepts
  • Integrating with HBase
The hands-on labs dives deep into HiveQL & enables its students to use Hive in a real life projects using many of its advanced features. Students will get access to real Hadoop clusters provisioned by ClustersToGo.com. The Hadoop distribution used in the class is Cloudera's CDH4 (though we can also use other distros, on request)  

Prerequisites


Basic Linux command line skills , DB knowledge, MPP architecture knowledge, HDFS is a must.

Recommended Next Class


Pig Basics and Advanced; Part of Hadoop BI Developer's Track

Audience


Developers, IT Administrators, Managers, Analysts, Data Scientist

What to bring to your class


Your computer, Any SSH Client like putty.exe

Recommended Readings


- O'Reilly's ‘Hadoop’ book by Tom White

Class Date


May 26th 2013

Class Duration


4 hours

Class Location


3200 Coronado Dr, Santa Clara, CA 95054, 408 306 8462

Price & Registration


Option 1: Pay using Paypal at training@thirdeyecss.com 24 hours before the class. Option 2: Send us a check payable to "Third Eye CSS" at the mailing Address : 5201 Great America Parkway, Suite 320, Santa Clara, CA 95054. Check must be received 24 hours before the class start time. Option 3: Payments must be received by Third Eye CSS 24 hours before class start time. Any Cancellation must be notified 12 hours before class start time, otherwise, no refund would be issued.

Contact Information:


Training Department training@thirdeyecss.com (408) 290-9949 – Ext 3
Hive Administration & HiveQL Analytics – Deep Dive

Apache Cassandra – Data Modeling Concepts

Apache Cassandra is a highly scalable, high performance and fault tolerant distributed data infrastructure. Cassandra solves both real time and analytical big data problems, from write intensive workloads to sub millisecond caching layer reads to analytical workloads involving petabytes of data using MapReduce. Offering distribution of data across multiple data centers and incremental scalability with no single points of failure, Cassandra is the logical choice when you need reliability without compromising performance. An introductory class which focuses on imparting the core concepts, architecture and design of Apache Cassandra.

Course Contents


  • Overall Cassandra Architecture
  • Cassandra Strengths and weaknesses
  • Major and minor features
  • Various Replica placement strategies
  • CAP theorem
  • ACID transactions
  • Data Modeling - Basic Concepts - Thinking in noSql mode - Denormalization - Some use cases
  • Cassandra tools to manipulate Cassandra schema

Real Time Demos


This training session involves interactive demos on the following topics:
  1. Datastax OPScenter demo in EC2(how to manage cassandra)
  2. Cassandra CLI
  3. Cassandra CQL
  4. Datastax enterprise in EC2
  5. How to run map reduce on cassandra in EC2
  6. How to use the same cassandra cluster for real time transactions and analytics

Prerequisites


Developers with basic understanding of database and ACID transactiions

Audience


Developers,  Database administrators, Data Analytics professionals, Data architects, Managers

Recommended Readings


  1. Cassandra wiki
  2. Cassandra documentation

Class Date


May 26th 2013

Class Duration


8 hours Class

Class Location


3200 Coronado Drive, Santa Clara, CA 95054

Registration


Option 1: Pay using Paypal at training@thirdeyecss.com 24 hours before the class. Option 2: Send us a check payable to "Third Eye CSS" at the mailing Address : 5201 Great America Parkway, Suite 320, Santa Clara, CA 95054. Check must be received 24 hours before the class start time. Option 3:  

Contact Information


Training Department training@thirdeyecss.com (408) 290-9949 – Ext 3
Apache Cassandra – Data Modeling Concepts

Apache HBase Developer – Architecture, Design & Implementation

Learn the HBase world NOSQL database built on top of Hadoop from architecture, design considerations, modeling and development perspective. HBase is used when random, realtime read/write access to Big Data set. We will go over various real life scenarios in this class. We will go over how HBase provides linear and modular scalability along with consistent reads and writes. We will walk you through java APIs. Overview of Block Cache and Bloom filers and it's need in real-time queries would be covered. We will do in-depth lab work to learn various skills to work around this powerful Big Data Analytics tool.

Lab Work


We will work with a HBase cluster, look at configurations & load & query data. We will start with creating table and move our learning journey with hands on practicals to mid-advanced level.

Prerequisites


Developers with Java knowledge and Hadoop, MapReduce knowledge.

Audience


Developers, Data Analytics professionals, Business Analysts, Managers

Recommended Readings


- Hbase Architecture  

Course Duration:


A whole day Weekend Class: 10am to 5pm

Location:


Third Eye's offices at 2900 Gordon Ave, Suite 100-20 Santa Clara, CA 95051

Registration:


Location:


2900 Gordon Ave, Suite 100-20 Santa Clara, CA 95051

Registration:


Please contact us at info@bdcuniversity.com

Contact Information:


info@bdcuniversity.com 408-256-3282
Apache HBase Developer – Architecture, Design & Implementation
click to chat