Careers360 Logo
Interested in this College?
Get updates on Eligibility, Admission, Placements Fees Structure
Compare

Quick Facts

Medium Of InstructionsMode Of LearningMode Of Delivery
EnglishSelf Study, Virtual ClassroomVideo and Text Based

Courses and Certificate Fees

Certificate AvailabilityCertificate Providing Authority
yesMIT Cambridge

The Syllabus

Python
  • Introduction to Python and IDEs 
  • Python Basics 
  • Object Oriented Programming 
  • Hands-on Sessions And Assignments for Practice
Linux
  • Introduction to Linux 
  • Linux Basics 
  • Hands-on Sessions And Assignments for Practice

  • SQL Basics 
  • Advanced SQL 
  • Deep Dive into User Defined Functions 
  • SQL Optimization and Performance

  • What is Data Engineering, Use Cases, and Applications? 
  • Data Engineer or Data Scientist? 
  • Data Engineering Problems Tools of a Data Engineer 
  • Working with Different Databases 
  • Processing Tasks, Scheduling Tools, and Different Cloud Providers 
  • Why Cloud Computing, Use Cases, and Applications? 
  • Different Cloud Services

  • Introduction to HDFS and Apache Spark 
  • Spark Basics 
  • Working with RDDs in Spark 
  • Aggregating Data with Pair RDDs 
  • Writing and Deploying Spark Applications 
  • Parallel Processing 
  • Spark RDD Persistence
  • Integrating Apache Flume and Apache Kafka 
  • Spark Streaming 
  • Improving Spark Performance 
  • Spark SQL and Data Frames 
  • Scheduling or Partitioning

  • Understand the difference between SQL and NoSQL. Create relations data models and NoSQL-based data models on business reporting requirements. Work with ETL tools to push the data to the model. 
  • Work on MS SQL and Cassandra for creating databases and using ETL tools for data extracting, transformation, and loading to the models.
  • Project 1: Data Modeling using Relational Databases 
  • Project 2: Data Modeling using Apache Cassandra

  • Master the skills of building a highly scalable data warehouse on AWS. Work with Redshift and pull the data from RDS and other media services of AWS using ETL pipeline and load the data into the data warehouse.

  • What is ETL, Use Cases, and Applications? 
  • Why We Need ETL Tools> 
  • Working with Different Data Sources—Relational Databases, NoSQL, HDFS, Stream Data, CSV Files, TXT Files, Json or XML Files, and Fixed File Formats 
  • Transformation of Data 
  • Loading Data into a Data Model or File System 
  • Using SQL for Data Transformation 
  • Optimizing ETL Processes 
  • Understanding ETL Architecture for Tracking the Data 
  • Flow and Data Pipelines 
  • Understanding Data Quality Checks

  • AWS Data Storage Services—S3, S3 Glacier, Amazon DynamoDB 
  • AWS Processing Services—AWS EMR, EMR Cluster, Hadoop, Hue with EMR, Spark with EMR, AWS Lambda, HCatalog, Glue, and Glue Lab 
  • AWS Data Analysis Services—Amazon Redshift, Tuning Query Performance, Amazon ML, Amazon Athena, Amazon Elasticsearch, and ES Domain

  • Learn to schedule, automate, and monitor ETL pipelines with Apache Airflow, Luigi, and Cron. 
  • Learn and master how to implement data quality checks and processes for running the ETL in a production environment. 
  • Understand and create a strong process and architecture to avoid ETL failure due to data quality issues. Learn how to handle ETL failure issues in a production environment.

  • Use Docker for converting your applications and data pipelines to containers-based applications 
  • Orchestrate containers to deliver scalable and reliable performance using Kubernetes

  • Implement the concepts learnt in the program and create a highly scalable data warehouse architecture for loading data from different sources and use NoSQL database for query to provide data results asked by the analytics team. 
  • Use AWS cluster to deploy your solution data processing.

  • Non-Relational Data Stores and Azure 
  • Data Lake Storage Data Lake and Azure Cosmos DB 
  • Relational Data Stores 
  • Why Azure SQL? Azure Batch 
  • Azure Data Factory 
  • Azure Data Bricks 
  • Azure Stream Analytics 
  • Monitoring & Security

  • Learn Basic statistics required for Data Science 
  • Master Data Science Algorithms 
  • Learn Linear regression and work on Recommender problems, collaborative filtering 
  • Non-linear classification, kernels 
  • Deep Learning Introduction and Neural networks 
  • RNN & CNN 
  • Unsupervised learning: clustering 
  • Generative models, mixtures 
  • Learning to control: Reinforcement learning 
  • Natural Language Processing

Articles

Download Careers360 App's

Regular exam updates, QnA, Predictors, College Applications & E-books now on your Mobile

  • student
    300M+

    Students

  • colleges
    36,000+

    Colleges

  • exams
    550+

    Exams

  • ebook
    1500+

    E-Books

  • certification
    16000+

    Certifications

student
Mobile Screen

We Appeared in

Back to top