Before we delve into the technical details of various machine learning tools I will describe the development environment I am using and what software packages will be needed to run any of the code posted on the tinhatben GitHub account. That being said data science and machine learning techniques are not specific to any programming language or tools and can be adapted to any language. Throughout the course of this blog I will be using Python, specifically Python 2.7. This blog is not intended to teach Python and requires some knowledge of how to use it. If you are interested in learning I highly encourage it! Some useful resources for learning Python include:
- Python Beginners Guide
- Coursera Class (I haven’t taken this one myself but have taken other Coursera classes which were very good)
For those familiar with Python: I completely understand that Python 2.7 is destined for obsolescence and that Python 3 is the future! I am using 2.7 out of pure laziness as it is the default Python interpreter installed on Linux Ubuntu. Most of the code should work on Python 3 as well, however I will not be testing to confirm this is the case and there are some compatibility issues between the two versions (Python2vs3). If you are learning Python for the first time I strongly encourage you to start with Python 3.
Irrespective of your operating system you can install all of the required packages using the pip package manager via the command line if you already have it installed.
pip install scipy numpy matplotlib jupyter pandas scikit-learn
During the Python Windows installation process there is a checkbox to tick if you would like pip to be installed. If however you would like to simply use a Windows installer some of the packages make them available. See:
On Debian based Linux distributions you can run (as root):
apt-get install python-pip (there is a similar command using Yum for Fedora systems)
Alternatively if you prefer to use the apt-get package manager you can install by:
apt-get install python-scipy python-numpy python-matplotlib python-pandas python-scikits-learn
pip install jupyter
Though the pip package manager tends to carry the latest versions compared to apt-get.
Assuming each of the above requirements installed correctly you should be able to launch jupyter using the command:
This command will launch a web page that is running locally on your machine. Jupyter is a very handy tool for running data analytics and provides an easy way to run only specific segments of code. If the jupyter page launched correctly you should see something similar to:
You can start a new notebook, by clicking on New and selecting Python notebook.
If you have got this far, you are good to go! So let’s jump in. If you are having issues, feel free to post comments and I will help where I can. Otherwise stackoverflow is a great resource for technical help!
See you soon