Big Data Analytics with Scala and Apache Spark

Master distributed data processing using Scala and Apache Spark. Learn functional programming for big data analysis, optimization techniques.

This comprehensive course teaches distributed big data processing using Scala and Apache Spark. Students learn to manipulate large-scale data using functional programming concepts, focusing on Spark's programming model and distributed collections framework. The curriculum covers essential topics including RDDs, transformations, actions, and performance optimization. Through hands-on programming assignments, participants master data loading, manipulation, and analysis while understanding crucial concepts like shuffling, partitioning, and data locality.

2,271 already enrolled

Instructors:

Prof. Heather Miller

English

21 languages available

This course includes

27 Hours

Of Self-paced video lessons

Intermediate Level

Completion Certificate

awarded on course completion

4,115

Audit For Free

Add to compare

What you'll learn

Read and load data into Apache Spark from persistent storage

Manipulate large datasets using Spark and Scala

Express data analysis algorithms in functional style

Optimize performance by avoiding shuffles and recomputation

Work with Spark SQL and DataFrames

Implement distributed data processing solutions

Skills you'll gain

scala programming

apache spark

big data

distributed computing

SQL

RDD

data processing

spark optimization

functional programming

data analysis

This course includes:

350 Minutes PreRecorded video

7 programming assignments

Access on Mobile, Tablet, Desktop

FullTime access

Shareable certificate

Get a Completion Certificate

Share your certificate with prospective employers and your professional network on LinkedIn.

Created by

École Polytechnique Fédérale de Lausanne

Provided by

Coursera

Top companies offer this course to their employees

Top companies provide this course to enhance their employees' skills, ensuring they excel in handling complex projects and drive organizational success.

There are 4 modules in this course

This comprehensive course focuses on big data analysis using Scala and Apache Spark, emphasizing distributed data processing. Students learn to manipulate large datasets using functional programming concepts and Spark's distributed collections framework. The curriculum covers essential topics like RDDs, transformations, actions, and optimization techniques. Special attention is given to performance considerations in distributed systems, including data locality and shuffle operations. The course also explores structured data processing using Spark SQL, DataFrames, and Datasets.

Getting Started + Spark Basics

Module 1 · 11 Hours to complete

Reduction Operations & Distributed Key-Value Pairs

Module 2 · 6 Hours to complete

Partitioning and Shuffling

Module 3 · 1 Hours to complete

Structured data: SQL, Dataframes, and Datasets

Module 4 · 8 Hours to complete

Fee Structure

Payment options

Financial Aid

Instructor

Prof. Heather Miller

4.7 rating

148 Reviews

1,02,373 Students

2 Courses

Assistant Professor

Heather Miller is an assistant professor in the School of Computer Science at Carnegie Mellon University, where she focuses on data-centric distributed systems and programming languages. Before her current role, she was a research scientist at the École Polytechnique Fédérale de Lausanne (EPFL) and co-founded the Scala Center, which promotes the use of the Scala programming language. Miller has a PhD from EPFL, where she contributed significantly to Scala's development, and she is known for her work on MOOCs that have engaged over a million students. Her research aims to bridge theoretical advancements in programming languages with practical industrial applications.

This course includes

27 Hours

Of Self-paced video lessons

Intermediate Level

Completion Certificate

awarded on course completion

4,115

Audit For Free

Add to compare

Testimonials

Testimonials and success stories are a testament to the quality of this program and its impact on your career and learning journey. Be the first to help others make an informed decision by sharing your review of the course.

Frequently asked questions

Below are some of the most commonly asked questions about this course. We aim to provide clear and concise answers to help you better understand the course content, structure, and any other relevant information. If you have any additional questions or if your question is not listed here, please don't hesitate to reach out to our support team for further assistance.

When will I have access to the lectures and assignments?

What will I get with a Certificate?

What is the refund policy?

Is financial aid available?