Setting Up Your Data Science Work Bench

栏目: IT技术 · 发布时间: 4年前

内容简介:This article is a guide to setting up your computer ready to work on data science projects. Many of the tools I have listed are my personally preferred tools. However, in most cases, there are several alternatives. It is worth exploring the different optio

Setting Up Your Data Science Work Bench

Photo by ThisisEngineering RAEng on Unsplash

Setting Up Your Data Science Work Bench

Get your computer ready for learning data science

In my lastpost, I covered the core tools required for data science work. In this article, I am going to give a step by step guide to getting your computer set up to perform typical data science and machine learning tasks.

I personally work on a mac so most set up instructions will be set up for this operating system.

Install python

As discussed in my last post python is now the most popular programming language for data science practitioners. Therefore the first step in configuring your computer is to install python.

To install and configure python on your computer you will need to use the terminal. If you have not already set this up you will need to download and install Xcode (Apple’s Integrated Development Environment).

Mac OS X comes with python 2.7 already installed. However, for many data science projects you will need to be able to work with a variety of different python versions.

There are a number of tools available that enable the installation and management of different python versions however pyenv is probably one of the simplest to use. Pyenv supports the management of python versions at both the user and project level.

To install pyenv you will need to first install homebrew , which is a package manager for Mac. Once you have this you can install pyenv with this command (for Windows installation see these instructions ).

brew install pyenv

You will then need to add the pyenv initializer to your shell startup scripts and reload your bash_profile by running the following.

echo 'eval "$(pyenv init -)"' >> ~/.bash_profile
source ~/.bash_profile

To view versions of python that are installed on your system run the following command.

pyenv install --list

To install a new version of python simply run.

pyenv install <python-version>#e.g.pyenv install python-3.7

Install python packages

Pip is the preferred installer for installing python packages and is included by default with python 3.4 and above. You will need this to install any open-source python libraries.

To install a package using pip simply run the following.

pip install <package>#e.g.pip install pandas

Virtual environments

Different python projects will require different dependencies and versions of python. It is therefore important to have a way to create isolated and reproducible environments for each project. Virtual environments accomplish this.

There are a number of tools to create python virtual environments but I personally use pipenv .

Pipenv can be installed with homebrew.

brew install pipenv

To create a new environment using a specific version of python. Make a new directory and then run the following command from your new directory.

mkdir pip-test
cd pip-test
pipenv --python 3.7

To activate the environment run pipenv-shell you will now be in a new environment called ‘pip-test’.

If we inspect the contents of the directory you will see that pipenv has created a new file called Pipfile . This is the pipenv equivalent of a requirements file and contains all packages and versions that are used in the environment.

To install packages into the pipenv environment simply run the following.

pipenv install pandas

Any packages installed will be reflected in the pip file which means that the environment can be recreated easily using this file.

Jupyter Notebooks

Jupyter Notebooks are a web-based application for writing code. They are particularly suited to data science tasks because they enable you to render documentation, diagrams, tables and charts directly in line with your code. This creates a highly interactive and shareable platform for developing data science projects.

To install Jupyter notebooks simply run pip install notebook or if you are working in a pipenv shell pipenv install notebook .

As this is a web-based application you need to start the notebook server to begin writing your code. You do this by running this command.

jupyter notebook

This will open the application in your web browser, the default URL is http://127.0.0.1:8888 .

Setting Up Your Data Science Work Bench

The Notebook application running in the browser

Jupyter Notebooks are able to work with virtual environments so that you are able to run the notebooks for a project in the correct project environment. To make the pipenv environment available in the web application you need to run the following.

python -m ipykernel install --user --name myenv --display-name "Python (myenv)"#e.g.python -m ipykernel install --user --name jup-test --display-name "Python (jup-test)"

If you now restart the web application and got to new you will see your pipenv environment available. Selecting this will start a new notebook that will run with all the dependencies you have set up in your pipenv shell.

A notebook running in the Python(jup-test) environment

Python IDE

Jupyter Notebooks are very good for exploratory data science projects and for writing code that you will only use once. However, for efficiency, it is a good idea to write commonly used pieces of code into functions within modules that can be imported and used across projects (this is known as modularising your code).

Notebooks are not particularly well suited to writing modules. For this task, it is better to use an IDE (Integrated Development Environment). There are many available but I personally use Pycharm. The benefit of using IDE’s is that they contain tools such as Github integration and unit testing built-in.

Pycharm has both a paid professional version and a free community edition. To download and install Pycharm visit this website and follow the installation instructions.

Version control

One final tool you will want to use for your data science projects is Github. This is the most commonly used tool for version control. Version control essentially involves storing a version of your project online. Development is then performed locally from branches. Branches are essentially a copy of the project where changes can be made that will not affect the master version.

Once changes have been made locally you can push the changes to Github and they can be merged into the master branch in a controlled process known as a pull request.

Using Github will enable you to track changes to your project. You can also make changes and test the impact they have before integrating them into the final version. Github also enables collaboration with others as they can safely make changes without impacting the master branch.

To use Github you first need to install it which can be done by following these instructions . You will then need to visit the Github website and create an account.

Once you have an account you can create a new repository.

Setting Up Your Data Science Work Bench

Creating a new repository

To work on the project locally you will need to clone the repository.

cd my-directory
git clone https://github.com/rebeccavickery/my-repository.git

This article is a guide to setting up your computer ready to work on data science projects. Many of the tools I have listed are my personally preferred tools. However, in most cases, there are several alternatives. It is worth exploring the different options to find those best suited to your working style and projects.

Thanks for reading!

I send out a monthly newsletter if you would like to join please sign up via this link. Looking forward to being part of your learning journey!


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

左手打工,右手创业

左手打工,右手创业

韩垒 / 东北师大 / 2011-4 / 29.80元

《左手打工右手创业》内容简介:打工一族,不能没有激情,不能没有梦想,激情能让你战胜困难,勇往直前;同时,要让梦想变成现实,你还必须具备务实的态度和实干的精神,一步一步向目标前进。创业不是简单的乌托邦式的理想,不是仅凭一腔热血加美好梦想就能顺利到达胜利的彼岸。个人创业更多的是要依靠前期科学的规划、多角度的观察、理性的分析、有效的资源分析与整合、成熟高效的运作技能、良好的商业心态等。 《左手打工......一起来看看 《左手打工,右手创业》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

URL 编码/解码
URL 编码/解码

URL 编码/解码

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试