Introduction to Big Data Engineering

Introduction to Big Data Engineering

This course introduces learners to the fundamentals of big data systems, tools, and techniques, progressing to intermediate-level skills in designing, building, and managing big data pipelines. Students will explore key frameworks like Hadoop and Apache Spark, work with data storage solutions, and delve into data ingestion, processing, and analytics. The course also covers cloud solutions, stream processing, machine learning, and data governance, with a focus on hands-on practice and real-world applications.

Add a Title

Add paragraph text. Click “Edit Text” to update the font, size and more. To change and reuse text themes, go to Site Styles.

Next Item

Previous Item

Course Duration:

36 hours

Level:

Beginner to Intermediate

Course Objectives

Understand the core concepts of big data and its characteristics (Volume, Velocity, Variety, Veracity, Value)
Learn how to work with data storage systems, including HDFS and Amazon S3
Gain proficiency in big data frameworks, including Hadoop and Apache Spark
Develop skills in building ETL pipelines using tools like Apache Kafka and NiFi
Understand cloud-based big data solutions and tools
Learn the basics of machine learning on big data
Build and manage data governance and security practices
Complete a hands-on capstone project to design and implement a scalable big data pipeline

Prerequisites

Basic programming knowledge (Python or Java preferred)
Familiarity with databases and basic SQL
Understanding of fundamental data structures and algorithms