Master distributed data processing techniques and implementations using Scala and Apache Spark for comprehensive large-scale data analysis and transformation.
Master distributed data processing techniques and implementations using Scala and Apache Spark for comprehensive large-scale data analysis and transformation.
This course cannot be purchased separately - to access the complete learning experience, graded assignments, and earn certificates, you'll need to enroll in the full Functional Programming in Scala Specialization program. You can audit this specific course for free to explore the content, which includes access to course materials and lectures. This allows you to learn at your own pace without any financial commitment.
4.6
(2,586 ratings)
1,00,491 already enrolled
Instructors:
English
پښتو, বাংলা, اردو, 4 more
What you'll learn
Load and manipulate large-scale data with Apache Spark
Optimize distributed computations for performance
Implement data analysis algorithms using functional programming
Work with structured data using Spark SQL and DataFrames
Handle data partitioning and avoid unnecessary shuffling
Skills you'll gain
This course includes:
5.83 Hours PreRecorded video
Access on Mobile, Desktop, Tablet
FullTime access
Shareable certificate
Closed caption
Get a Completion Certificate
Share your certificate with prospective employers and your professional network on LinkedIn.
Created by
Provided by

Top companies offer this course to their employees
Top companies provide this course to enhance their employees' skills, ensuring they excel in handling complex projects and drive organizational success.





There are 4 modules in this course
This comprehensive course focuses on big data analysis using Scala and Apache Spark. Students learn to manipulate distributed datasets using functional programming concepts, optimize performance through data partitioning and shuffling, and work with structured data using Spark SQL. The curriculum covers essential topics including RDDs, transformations and actions, cluster topology, and advanced features like DataFrames and Datasets, providing hands-on experience with real-world data analysis.
Getting Started + Spark Basics
Module 1 · 11 Hours to complete
Reduction Operations & Distributed Key-Value Pairs
Module 2 · 6 Hours to complete
Partitioning and Shuffling
Module 3 · 56 Minutes to complete
Structured data: SQL, Dataframes, and Datasets
Module 4 · 8 Hours to complete
Fee Structure
Instructor
Assistant Professor
Heather Miller is an assistant professor in the School of Computer Science at Carnegie Mellon University, where she focuses on data-centric distributed systems and programming languages. Before her current role, she was a research scientist at the École Polytechnique Fédérale de Lausanne (EPFL) and co-founded the Scala Center, which promotes the use of the Scala programming language. Miller has a PhD from EPFL, where she contributed significantly to Scala's development, and she is known for her work on MOOCs that have engaged over a million students. Her research aims to bridge theoretical advancements in programming languages with practical industrial applications.
Testimonials
Testimonials and success stories are a testament to the quality of this program and its impact on your career and learning journey. Be the first to help others make an informed decision by sharing your review of the course.
4.6 course rating
2,586 ratings
Frequently asked questions
Below are some of the most commonly asked questions about this course. We aim to provide clear and concise answers to help you better understand the course content, structure, and any other relevant information. If you have any additional questions or if your question is not listed here, please don't hesitate to reach out to our support team for further assistance.