top of page

Data Science

This site is under construction

Anaconda and Data Science

Anaconda is a free and open source distribution of the Python and R programming languages for data science and machine learning related applications (large-scale data processing, predictive analyticsscientific computing), that aims to simplify package management and deployment. Package versions are managed by the package management system conda. The Anaconda distribution is used by over 6 million users, and it includes more than 250 popular data science packages suitable for Windows, Linux, and MacOS.

So, what exactly is data science ? It's the process of asking interesting questions, and then answering those questions using data. Generally speaking, the data science workflow looks like this:

 

  • Ask a question

  • Gather data that might help you to answer that question

  • Clean the data

  • Explore, analyze, and visualize the data

  • Build and evaluate a machine learning model

  • Communicate results

 

This workflow doesn't necessarily require advanced mathematics, a mastery of deep learning, or many of the other skills listed above. But it does require knowledge of a programming language and the ability to work with data in that language. 

Step 1: Get comfortable with Python

Step 2: Learn data analysis, manipulation, and visualization with pandas

Step 3: Learn machine learning with scikit-learn

Step 4: Understand machine learning in more depth

Step 5: Keep learning and practicing

bottom of page