Updated: October 2019
Note: Python online learning recommendations need updating. The Python books are still highly recommended.
Below are data science resources to point you in the right direction for learning more about data analytics, data science, and programming. I’ll update the list periodically as I learn of new resources and tools. R, Python, and SQL are programming languages used for data collection, data manipulation, data analytics, statistical analysis, web scraping, machine learning, and artificial intelligence. You will likely use one or all of these throughut your data science career. If you suggestions or questions, please contact me via email or LinkedIn, thank you!
Online Educational Resources
- Business Science University: Founded by data science leader and instructor Matt Dancho, this is definitely my favorite and top recommended online data science platform. It’s not free, but the quality of Matt’s programs are top-notch and well worth every dollar. I’ve seen Matt lead workshops at numerous conferences and am currently enrolled his R Shiny course.
- Udemy: Udemy has become a powerhouse for data science educational material and I’ve taken several Udemy classes to reinforce the programming and analytics training that I received through my masters program. I’ve taken some R, Spark, and Python classes, and really enjoyed Spark and Python for Big Data with PySpark
- Super Data Science Courses: I’m a huge fan of Kirill Eremenko’s SuperDataScience, and now they’re offering online introductory courses on their site and via Udemy
- Kaggle: World’s largest community of data scientists and machine learners, with discussion boards, downloadable datasets, and competitions to reinforce your programming skills with R, Python, etc.
- Khan Academy: Statistics is critical for any data scientist, and the Khan Academy statistics and probability lessons will provide you with the basics
- freeCodeCamp’s Best Courses: freeCodeCamp ranks online data science courses on their Medium blog
- Udacity’s Data Science Nanodegree: I know two people who have taken this course and raved about it, but it requires some understanding of statistics and experience working with data (Udacity has other data science courses that are easier difficulty)
- Fast.ai: For the more advanced data science students, Fast.ai provides free, high-quality, highly-reviewed machine learning and deep learning courses
Outside of the classroom, podcasts help me stay current with data science news and local events. If you have any podcast suggestions that touch on analytics careers, data science, machine learning, or artificial intelligence, I’d love to hear from you!
- Scatter Podcast: Hosted by Javier Orraca (hey, that’s me!), Scatter Podcast conducts interviews with analytics, data science, and insights professionals at startups and publicly traded corporations to better understand what a “day in the life” looks like, and to garner tips for students and job seekers
- Super Data Science Podcast: Hosted by Kirill Eremenko, the goal of the Super Data Science podcast is to bring you the most inspiring data scientists and analysts from around the world
- Data Skeptic: Hosted by Kyle Polich, Data Skeptic aims to reveal the science of fake news
- AI Podcast by Lex Fridman: This is one of the more forward-thinking podcasts available. Hosted by Lex Fridman, AI thought-leader and research scientist at MIT, you get to hear about the latest trends in data science from some of the biggest names in the industry. Almost every guest that Lex hosts is a world-renowned machine learning developer and/or artificial intelligence leader.
- Linear Digressions: This podcast is great for students and practitioners, each episode covering the complexities of a specific data science topic, e.g., A/B testing and compliance bias, data privacy laws, text analysis (and tools like Word2Vec), computational limitations, AutoML tools, etc.
Eremenko, Kirill. Confident Data Skills: Master the Fundamentals of Working with Data and Supercharge Your Career. Kogan Page, 2018.
- From the creator of Super Data Science, this was an easy “for fun” read, light math, and provides an overview of data analytics and careers in this field. I swear I’m not on Kirill’s payroll(!)… he’s just a data science career guru with fresh perspectives. Amazon link
Provost, Foster and Fawcett, Tom. Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking. O’Reilly Media, Inc., 2013.
- This was a textbook for Foundations of Data Science, an UCI graduate course that I recently completed. Examples in this textbook are denser and math-heavier than Confident Data Skills. This would be most appropriate for anyone actively trying to move into a data science role. Amazon link
O’Neil, Cathy. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown Publishing, 2016.
- O’Neil argues that the predictive algorithms being used today are opaque, unregulated, and uncontestable, even when they’re wrong. She builds upon her career experiences as a Quantitative Engineer and aims to educate data scienctists about how to take more responsibility for their algorithms. Amazon link
Murray, Scott. Interactive Data Visualization for the Web: An Introduction to Designing with D3. O’Reilly Media, Inc., 2nd Edition, 2017.
- Note: This 2nd Edition (October 2017) release was updated with D3’s v4 syntax and D3 v5 was released in April 2018. While some of the syntax in this 2nd Edition may be rendered obsolete, it’s still a valuable resource for learning the fundamentals of D3 and data design.
Abbott, Dean. Applied Predictive Analytics. Wiley, 2014.
- This book is ideal for tech-savvy business and data analysts, and serves as a solid bridge between academia and professional scenarios. Amazon link
Burkov, Andriy. The Hundred-Page Machine Learning Book. Andriy Burkov, 2019.
- This is one of my favorite reference books. It is best suited for beginners already comfortable with analytics terminology. In the author’s words, this is an “attempt to write an easy-to-read book on machine learning that isn’t afraid of using math,” and it is “also the first attempt to squeeze a wide range of machine learning topics in a systematic way and without loss in quality.”
- If you know the basics of R or Python and want to push your predictive modeling and ML education to the next level, you’ll love how succinct this book is. Amazon link
Foreman, John. Data Smart: Using Data Science to Transform Information into Insight. Wiley, 2013.
- Contrary to popular belief (especially among us data folks), not every problem is a big data problem that requires programming via R or Python. This book explores the full capabilities of Excel for data science, and does a great job at it. Espcially for Excel power users, this book would be a first great step before plunging into R or Python (or vice versa, perhaps you’re an avid R user that barely knows Excel… this book would be great for you too). Amazon link
R: The Basics
- RStudio: To get started with R, download and install RStudio’s open-source desktop IDE
- swirl: Once RStudio is installed (or your IDE of choice), installing the swirl package is as easy as typing install.packages(“swirl”) in the RStudio console
- This R package includes 15 interactive lessons (about 30 minutes each) that teach you the fundamentals of R programming and the data structures
- R for Excel Users: Pivoting from a financial modeling background to data science, this blog was helpful in understanding how to do common “Excel stuff” in R
- RStudio shortcuts: Keystrokes make everything faster, and this certainly helps
- Tip for Excel users: If you’re using Excel regularly, stop using your mouse!
- PSS: I love Excel, and R. So if you’re crushing Excel regularly and have any R questions, let me know.
- [Introduction to dplyr]: After installing the dplyr package, you can copy/paste sample R code from this site to learn how to use dplyr functions
- dplyr is one of the Tidyverse packages that makes data manipulation and summarization easier and faster to do in R
- ggplot2: While R’s base graphing capabilities are powerful, the ggplot2 package makes it easier to produce high-quality plots and visualizations (ggplot2 sample below)
Python: The Basics
Many online Python trainings were made when Python 2.x was standard. Python 3.x is the future of the language, so don’t bother installing or learning Python 2.x.
- PyCharm: A plethora of Python IDEs exist, PyCharm just happens to be my Python desktop IDE of choice
- The full version is $89 USD for individual users, but if you’re an active student, you can download the entire suite for free.
- You can download a free stripped-down community version but it lacks full web development, database and SQL support
- Google for Education: Google’s Python classes (YouTube links below) are loaded with great content and three downloadable scripts that accompany the lessons - These will take you a few days, and I can’t stand that a Python 3.x version has not been released yet, but the quality of the training provided by this instructor is great:
- Google’s Python Class (Day 1, Part 1): Introduction and strings
- Google’s Python Class (Day 1, Part 2): Lists and sorting
- Google’s Python Class (Day 1, Part 3): Dictionary and files
- Google’s Python Class (Day 2, Part 1): Regular Expressions
- Google’s Python Class (Day 2, Part 2): Utilities: OS Modules and Commands
- Google’s Python Class (Day 2, Part 3): URLs and HTTP, Exceptions
- Google’s Python Class (Day 2, Part 4): Conclusion
- There are two Python reference books that I highly recommend… I’ve learned a lot from these code-as-you-read books and I think the authors did incredible jobs with the introductions to new topics and concepts including NumPy and Pandas
SQL: The Basics