Big Data - Capstone Project

Product type

Big Data - Capstone Project

Coursera (CC)
Logo Coursera (CC)
Provider rating: starstarstarstar_halfstar_border 7.2 Coursera (CC) has an average rating of 7.2 (out of 6 reviews)

Need more information? Get more details on the site of the provider.

Description

When you enroll for courses through Coursera you get to choose for a paid plan or for a free plan

  • Free plan: No certicification and/or audit only. You will have access to all course materials except graded items.
  • Paid plan: Commit to earning a Certificate—it's a trusted, shareable way to showcase your new skills.

About this course: Welcome to the Capstone Project for Big Data! In this culminating project, you will build a big data ecosystem using tools and methods form the earlier courses in this specialization. You will analyze a data set simulating big data generated from a large number of users who are playing our imaginary game "Catch the Pink Flamingo". During the five week Capstone Project, you will walk through the typical big data science steps for acquiring, exploring, preparing, analyzing, and reporting. In the first two weeks, we will introduce you to the data set and guide you through some exploratory analysis using tools such as Splunk and Open Office. Then we will move into more ch…

Read the complete description

Frequently asked questions

There are no frequently asked questions yet. If you have any more questions or need help, contact our customer service.

Didn't find what you were looking for? See also: Business Intelligence (BI), Data Warehouse, Microsoft SQL Server, IT Security, and Web Accessibility.

When you enroll for courses through Coursera you get to choose for a paid plan or for a free plan

  • Free plan: No certicification and/or audit only. You will have access to all course materials except graded items.
  • Paid plan: Commit to earning a Certificate—it's a trusted, shareable way to showcase your new skills.

About this course: Welcome to the Capstone Project for Big Data! In this culminating project, you will build a big data ecosystem using tools and methods form the earlier courses in this specialization. You will analyze a data set simulating big data generated from a large number of users who are playing our imaginary game "Catch the Pink Flamingo". During the five week Capstone Project, you will walk through the typical big data science steps for acquiring, exploring, preparing, analyzing, and reporting. In the first two weeks, we will introduce you to the data set and guide you through some exploratory analysis using tools such as Splunk and Open Office. Then we will move into more challenging big data problems requiring the more advanced tools you have learned including KNIME, Spark's MLLib and Gephi. Finally, during the fifth and final week, we will show you how to bring it all together to create engaging and compelling reports and slide presentations. As a result of our collaboration with Splunk, a software company focus on analyzing machine-generated big data, learners with the top projects will be eligible to present to Splunk and meet Splunk recruiters and engineering leadership.

Created by:  University of California, San Diego
  • Taught by:  Ilkay Altintas, Chief Data Science Officer

    San Diego Supercomputer Center
  • Taught by:  Amarnath Gupta, Director, Advanced Query Processing Lab

    San Diego Supercomputer Center (SDSC)
Basic Info Course 6 of 6 in the Big Data Specialization Language English How To Pass Pass all graded assignments to complete the course. User Ratings 4.3 stars Average User Rating 4.3See what learners said Coursework

Each course is like an interactive textbook, featuring pre-recorded videos, quizzes and projects.

Help from your peers

Connect with thousands of other learners and debate ideas, discuss course material, and get help mastering concepts.

Certificates

Earn official recognition for your work, and share your success with friends, colleagues, and employers.

University of California, San Diego UC San Diego is an academic powerhouse and economic engine, recognized as one of the top 10 public universities by U.S. News and World Report. Innovation is central to who we are and what we do. Here, students learn that knowledge isn't just acquired in the classroom—life is their laboratory.

Syllabus


WEEK 1


Simulating Big Data for an Online Game
This week we provide an overview of the Eglence, Inc. Pink Flamingo game, including various aspects of the data which the company has access to about the game and users and what we might be interested in finding out.


4 videos, 4 readings expand


  1. Video: Welcome to the Big Data Capstone Project
  2. Video: Welcome from Splunk: Rob Reed World Education Evangelist
  3. Reading: Planning, Preparation, and Review
  4. Reading: A Game by Eglence Inc. : Catch The Pink Flamingo
  5. Video: A Summary of Catch the Pink Flamingo
  6. Video: A Conceptual Schema for Catch the Pink Flamingo
  7. Reading: Overview of the Catch the Pink Flamingo Data Model
  8. Reading: Overview of Final Project Design


Acquiring, Exploring, and Preparing the Data
Next, we begin working with the simulated game data by exploring and preparing the data for ingestion into big data analytics applications.


6 readings expand


  1. Reading: Downloading the Game Data and Associated Scripts
  2. Reading: Understanding the CSV Files Generated by the Scripts
  3. Reading: Optional Review of Splunk
  4. Reading: “Catch the Pink Flamingo” Data Exploration with Splunk
  5. Reading: Aggregate Calculations Using Splunk
  6. Reading: Filtering the Data With Splunk

Graded: Data Exploration With Splunk
Graded: Data Exploration Technical Appendix

WEEK 2


Data Classification with KNIME
This week we do some data classification using KNIME.


4 readings expand


  1. Reading: Review: Classification Using Decision Tree in KNIME
  2. Reading: Review: Interpreting a Decision Tree in KNIME
  3. Reading: Workflow Overview for Building a Decision Tree in KNIME
  4. Reading: Description of combined_data.csv

Graded: Classifying in KNIME to identify big spenders in Catch the Pink Flamingo

WEEK 3


Clustering with Spark
This week we do some clustering with Spark.


2 readings expand


  1. Reading: Informing business strategies based on client base
  2. Discussion Prompt: Is there only “one way” to cluster a client base?
  3. Discussion Prompt: How many clusters?
  4. Discussion Prompt: What kind of criteria might provide actionable information for Eglence Inc.?
  5. Reading: Practice with PySpark MLlib Clustering

Graded: Recommending Actions from Clustering Analysis

WEEK 4


Graph Analytics of Simulated Chat Data With Neo4j
This week we apply what we learned from the 'Graph Analytics With Big Data' course to simulated chat data from Catch the Pink Flamingos using Neo4j. We analyze player chat behavior to find ways of improving the game.


2 readings expand


  1. Reading: Understanding the Simulated Chat Data Generated by the Scripts
  2. Reading: Graph Analytics of Catch the Pink Flamingo Chat Data Using Neo4j

Graded: Graph Analytics With Chat Data Using Neo4j

WEEK 5


Reporting and Presenting Your Work



1 video, 1 reading expand


  1. Video: Week 5: Bringing It All Together
  2. Reading: Final project preparation


WEEK 6


Final Submission



1 video, 1 reading expand


  1. Video: Congratulations! Some Final Words...
  2. Peer Review: Optional 3-minute video: Splunk opportunity
  3. Reading: Part 2: Help us connect your video to your LinkedIn profile

Graded: Final Project
There are no reviews yet.

    Share your review

    Do you have experience with this course? Submit your review and help other people make the right choice. As a thank you for your effort we will donate £1.- to Stichting Edukans.

    There are no frequently asked questions yet. If you have any more questions or need help, contact our customer service.