DataScienceGO 2018 (Part 2/2 - Tips & Resources)

Now that I’ve had a few weeks to digest my experience at DataScienceGO 2018, definitely one for the books. I especially enjoyed making so many connections and absorbing (or rather, writing-a-million-miles-an-hour) so much knowledge over a short weekend. As promised, I wanted to share useful resources that I learned of at DSGO. Below I’m presenting my favorites, by presenter.

Send me a note here or on LinkedIn if you have any questions or check out my Resources page now updated with some of the items below:

Nadieh Bremer: Data visualization designer that was the first presenter at DSGO… The conference couldn’t have started with a better presenter. She blew away the crowd showcasing some of the freelancing work she does (the image above is something she designed for Scientific American!). * Visual Cinnamon: Nadieh’s data visualization portfolio is a work of art, and data storytelling was made to look this way. She uses D3 and JavaScript to make unique, beautifully designed interactive visualizations on the web. * Takeaway: “Remix what’s out there already” <– This is so true… If you have a programming or visualization idea, do the smart thing and Google it first… Much more time-efficient to start with similar code than to start from scratch.

Sinan Ozdemir: Sinan is the Founder and CTO of Kylie.ai, a customer service AI firm that builds custom solutions that augment the customer service experience. He has also been the Head Data Science Instructor at General Assembly since 2014. * Takeaways: We’re entering a new phase of AI technology where AI reached a new level of usability allowing new companies and competitors to decrease the competitive advantage of early AI adopters. AI can increase productivity, decrease bias, and reduce administrative tasks. Europe is a leader with regard to data standards and data privacy rights.

Jorge Zuloaga: Jorge is the head of Data Science at Big Squid where he is focused on data science consulting for business applications. Big Squid also builds upon Saas platforms to automate workflow involved in training machine learning models and putting them into production. * Takeaways: We’re in a current “machine learning tsunami,” where we’re drowning in data but gaining deep insights through ML and big data on a daily basis. ML works best when provided with very clear, unambiguous problem sets. Problem formulation is critical… Always inquire about the business need or business value associated with a problem.

Paige Bailey: Paige has worked for Microsoft and Chevron as a software engineer and data scientist and is very passionate about ML. She shared some resources that were new to me and gave some tips for deploying ML models. * caret Package for R: Caret (pronounced “carrot”) has several functions that attempt to streamline the ML model building and evaluation process, feature engineering, and data splitting. I have not tried this package out, but it sounds similar to another one I have used often recently, an open-source program called WEKA that I’ve been using recently for ML model training, data splitting / folding, testing, and feature engineering. * Keras: Keras is an open source neural network library designed to enable fast experimentation with deep neural networks. It focuses on being user-friendly, modular, extensible, and there’s even an R package for it! Keras uses the TensorFlow backend engine by default. * Magenta: Magenta is a TensorFlow-based research project exploring the role of machine learning in the process of creating art and music. * Takeaway: The process for developing AI skills should include 1) building your own AI model (yes, this takes a lot of effort and repetitions), 2) modeify someone else’s model, and 3) transfer learn (reuse a model open-sourced for another purpose).

Ben Taylor: Ben is the Chief AI Officer & co-Founder at Ziff. Ben and Ziff aim to deliver the world’s best machine and deep learning to product companies with the least amount of effort. His LinkedIn is a wealth of knowledge and also entertaining… For example, using deep learning to create previously unseen images or genetic GANs (Generative Adversarial Networks). I’d certainly recommend following Ben. * scikit-learn: Simple and efficient tools for data mining and ML in Python. * Takeaway: “Get better at efficient reps, and learn how to work faster!”

Matt Dancho: Matt is the Founder and CEO of Business Science. Ben uses Business Science to help data scientists learn how to apply enterprise-grade ML in business & finance. He’s also uses R and had a few great recommendations: * H20: This R package is a scalable, open-source ML platform developed by the H20.ai * Lime: Lime is a R package that helps you explain the predictions made behind black-box models. Very helpful (and easy) tool when dealing with multivariate regressions and predictive models. * Takeaways: Executives often lose sight of the cost of employee attrition, and using ML tools and model interpretability packages like Lime help explain drivers, in order of significance, that cause employees to leave. If your ML engagement is optimization based, get approval from your clients to collect measure performance after the engagement.

Mollie Pettit: Data visualization engineer, data scientist, and D3 superstar. A few of my graduate classmates and I had the pleasure of having a lunch with Mollie and she recommended the following book for learning the basics of web-based visualizations * Murray, Scott. Interactive Data Visualization for the Web: An Introduction to Designing with D3. O’Reilly Media, Inc., 2nd Edition, 2017. * Ideal for beginners, Scott Murray takes you through easy-to-consume fundamental concepts and methods of D3, the most powerful JavaScript library for web-based visualizations. Amazon link * PLEASE NOTE: This 2nd Edition (October 2017) release was updated with D3’s v4 syntax, however, D3 released their v5 in April 2018. While some of the syntax in this 2nd Edition may be rendered obsolete, it’s still a valuable resource for learning the fundamentals of D3 and data design. I’m currently getting through this book and didn’t find the v5 to be an issue, just spending a little more time on Google!


blog · scatter podcast · resources · contact · who is javier? · main