Data Science at the Command Line

Total time
Trainer
Jeroen Janssens
Location
At location, Online
Starting date and place
Logo Data Science Workshops B.V.
Provider rating: starstarstarstarstar 10 Data Science Workshops B.V. has an average rating of 10 (out of 7 reviews)

Tip: need more info about the programme, starting date or price? Request information for free!

10
Average rating for Data Science at the Command Line
Based on 3 reviews Read all reviewschevron_right
starstarstarstarstar
Marc Canaleta
CTO, Social Point
10
Data Science at the Command Line

"Besides demonstrating a good knowledge and experience in command-line tools for data science, Jeroen had very good training skills, clear communication, and managed to adapt the level of the training to the level of the audience, which is not always easy!" - 15-09-2020 12:00

"Besides demonstrating a good knowledge and experience in command-line tools for data science, Jeroen had very good training skills, clear co… read full review - 15-09-2020 12:00

Starting dates and places

computer Online: Microsoft Teams
19 Nov 2020 until 20 Nov 2020
check_circle Starting date guaranteed
view details
event 19 November, 2020, 10:00-17:00, Microsoft Teams, Day 1
event 20 November, 2020, 10:00-17:00, Microsoft Teams, Day 2
placeUtrecht
19 Nov 2020 until 20 Nov 2020
check_circle Starting date guaranteed
view details
event 19 November, 2020, 10:00-17:00, Utrecht, Day 1
event 20 November, 2020, 10:00-17:00, Utrecht, Day 2

Description

Introduction

The unix command line, although invented decades ago, is an amazing environment for efficiently performing tedious but essential data science tasks. By combining small, powerful, command-line tools (like parallel, jq, and csvkit), you can quickly scrub and explore your data and hack together prototypes.

This hands-on workshop is based on the O'Reilly book Data Science at the Command Line, written by our CEO Jeroen Janssens. You'll learn how to build fast data pipelines, how to leverage R and Python at the command line, and how to quickly visualise data. No prior knowledge about the unix command line is required.

By the end of this workshop you will have a solid understanding of …

Read the complete description

Frequently asked questions

There are no frequently asked questions yet. If you have any more questions or need help, contact our customer service.

Didn't find what you were looking for? See also: M&A (Mergers & Acquisitions), Python, Science, Joint Venture, and Software / System Engineering.

Introduction

The unix command line, although invented decades ago, is an amazing environment for efficiently performing tedious but essential data science tasks. By combining small, powerful, command-line tools (like parallel, jq, and csvkit), you can quickly scrub and explore your data and hack together prototypes.

This hands-on workshop is based on the O'Reilly book Data Science at the Command Line, written by our CEO Jeroen Janssens. You'll learn how to build fast data pipelines, how to leverage R and Python at the command line, and how to quickly visualise data. No prior knowledge about the unix command line is required.

By the end of this workshop you will have a solid understanding of how to integrate the command line in your data science workflow. Even if you're already comfortable processing data with, for example, R or Python, being able to also leverage the power of the command line can make you a more effective and efficient data scientist.

What you'll learn

  • Automate tedious tasks
  • Parallelise and distribute your tasks to multiple cores and machines
  • Convert your existing code to reusable command-line tools
  • Easily inspect, transform, and visualise data
  • Apply a variety of supervised and unsupervised machine learning algorithms

Schedule

Day 1:

  • Introduction
    • What is the command line?
    • Why learn the command line for doing data science?
    • A real-world data science use case
    • Getting up and running with the Docker image
  • Essential concepts of the unix command line
    • Running command-line tools
    • Combining command-line tools
    • Redirecting input and output
    • Working with files
    • Getting help
  • Obtaining data from logs, spreadsheets, and databases
  • Downloading data from the Internet and accessing APIs using curl
  • Transforming data with filters such as cut, paste, grep, and sed
  • Processing other data formats efficiently
    • JSON with jq
    • CSV with csvkit
    • HTML with pup
    • XML with xmlstarlet

Day 2:

  • Running R from the command line
  • Visualising data from the command line
    • Scatter plot
    • Histogram
    • Bar chart
    • Geographic visualisation
  • Parallelising and distributing data-intensive pipelines
  • Creating reusable command-line tools
    • Automate things in a Bash script
    • Convert your existing code to a command-line tool
    • Processing arguments
    • Working with streaming data
  • Applying machine learning
    • Outlier detection
    • Dimensionality reduction
    • Classification
    • Regression
  • Conclusion

Recommended preparation

Participants are kindly requested to have the following items installed prior to the start of the workshop:

  • Docker Desktop for Windows or for Mac or for Ubuntu
  • The docker image, by running: docker pull datascienceworkshops/data-science-at-the-command-line

Clients

We’ve previously delivered this workshop at:

  • Accenture
  • Amazon
  • Container Solutions
  • Prezi
  • SURFnet
  • Snow
  • Social Point
  • Teradata
  • The New York Times

Testimonials

"Great workshop! Very well done and very useful information delivered in an excellent and interactive manner. Jeroen anticipated very well on the different knowledge levels within the group. I would highly recommend the Data Science at the Command Line workshop to anyone that is interested in either kickstarting their command-line experiences or improving their data science with Unix power tools."

--Sanne Bouwman, Data Scientist, Teradata

"As a seasoned UNIX command line adept, I didn’t expect to learn much from a Data Science at the Command Line workshop. I was wrong! Over the years, many new tools have become available that I didn’t know about, and that can be combined with traditional tools in new ways.

Since attending the workshop, I have been able to simplify and improve the efficiency of many of the scripts I use on a daily basis. Recommended for anyone working from the command line, newbies and ninjas alike!"

--Joost van Dijk, Manager Middleware Services, SURFnet

"Besides demonstrating a good knowledge and experience in command-line tools for data science, the instructor had very good training skills, clear communication, and managed to adapt the level of the training to the level of the audience, which is not always easy!"

--Marc Canaleta, CTO, Social Point

10
Average rating for Data Science at the Command Line
Based on 3 reviews
starstarstarstarstar
Marc Canaleta
CTO, Social Point
10
Data Science at the Command Line

"Besides demonstrating a good knowledge and experience in command-line tools for data science, Jeroen had very good training skills, clear communication, and managed to adapt the level of the training to the level of the audience, which is not always easy!" - 15-09-2020 12:00

"Besides demonstrating a good knowledge and experience in command-line tools for data science, Jeroen had very good training skills, clear co… read full review - 15-09-2020 12:00

starstarstarstarstar
Sanne Bouwman
Data Scientist
10
Data Science at the Command Line

"Great workshop! Very well done and very useful information delivered in an excellent and interactive manner. Jeroen anticipated very well on the different knowledge levels within the group. I would highly recommend the Data Science at the Command Line workshop to anyone that is interested in either kickstarting their command-line experiences or improving their data science with Unix power tools." - 11-09-2020 17:51

"Great workshop! Very well done and very useful information delivered in an excellent and interactive manner. Jeroen anticipated very well on… read full review - 11-09-2020 17:51

starstarstarstarstar
Joost van Dijk
Manager Middleware Services, SURFnet
10
Data Science at the Command Line

"As a seasoned UNIX command line adept, I didn’t expect to learn much from a Data Science at the Command Line workshop. I was wrong! Over the years, many new tools have become available that I didn’t know about, and that can be combined with traditional tools in new ways.

Since attending the workshop, I have been able to simplify and improve the efficiency of many of the scripts I use on a daily basis. Recommended for anyone working from the command line, newbies and ninjas alike!" - 20-07-2020 09:52

"As a seasoned UNIX command line adept, I didn’t expect to learn much from a Data Science at the Command Line workshop. I was wrong! Over the… read full review - 20-07-2020 09:52

Jeroen Janssens - Principal Instructor
Jeroen Janssens
Principal Instructor
10

There are no frequently asked questions yet. If you have any more questions or need help, contact our customer service.

Where should we send the information?

Gender
(optional)
(optional)
(optional)
(optional)
(optional)
(optional)
(optional)
We store your personal details, and share them with Data Science Workshops B.V., in order to help you along via email and potentially via phone. You can find more info in our privacy policy.