Data Science
This site is under construction
Anaconda and Data Science
Anaconda is a free and open source distribution of the Python and R programming languages for data science and machine learning related applications (large-scale data processing, predictive analytics, scientific computing), that aims to simplify package management and deployment. Package versions are managed by the package management system conda. The Anaconda distribution is used by over 6 million users, and it includes more than 250 popular data science packages suitable for Windows, Linux, and MacOS.
So, what exactly is data science ? It's the process of asking interesting questions, and then answering those questions using data. Generally speaking, the data science workflow looks like this:
-
Ask a question
-
Gather data that might help you to answer that question
-
Clean the data
-
Explore, analyze, and visualize the data
-
Build and evaluate a machine learning model
-
Communicate results
This workflow doesn't necessarily require advanced mathematics, a mastery of deep learning, or many of the other skills listed above. But it does require knowledge of a programming language and the ability to work with data in that language.
Step 1: Get comfortable with Python
Step 2: Learn data analysis, manipulation, and visualization with pandas
Step 3: Learn machine learning with scikit-learn
Step 4: Understand machine learning in more depth
Step 5: Keep learning and practicing