Setting Up Your Data Science Work Bench

栏目: IT技术 · 发布时间: 4年前

内容简介:This article is a guide to setting up your computer ready to work on data science projects. Many of the tools I have listed are my personally preferred tools. However, in most cases, there are several alternatives. It is worth exploring the different optio

Setting Up Your Data Science Work Bench

Photo by ThisisEngineering RAEng on Unsplash

Setting Up Your Data Science Work Bench

Get your computer ready for learning data science

In my lastpost, I covered the core tools required for data science work. In this article, I am going to give a step by step guide to getting your computer set up to perform typical data science and machine learning tasks.

I personally work on a mac so most set up instructions will be set up for this operating system.

Install python

As discussed in my last post python is now the most popular programming language for data science practitioners. Therefore the first step in configuring your computer is to install python.

To install and configure python on your computer you will need to use the terminal. If you have not already set this up you will need to download and install Xcode (Apple’s Integrated Development Environment).

Mac OS X comes with python 2.7 already installed. However, for many data science projects you will need to be able to work with a variety of different python versions.

There are a number of tools available that enable the installation and management of different python versions however pyenv is probably one of the simplest to use. Pyenv supports the management of python versions at both the user and project level.

To install pyenv you will need to first install homebrew , which is a package manager for Mac. Once you have this you can install pyenv with this command (for Windows installation see these instructions ).

brew install pyenv

You will then need to add the pyenv initializer to your shell startup scripts and reload your bash_profile by running the following.

echo 'eval "$(pyenv init -)"' >> ~/.bash_profile
source ~/.bash_profile

To view versions of python that are installed on your system run the following command.

pyenv install --list

To install a new version of python simply run.

pyenv install <python-version>#e.g.pyenv install python-3.7

Install python packages

Pip is the preferred installer for installing python packages and is included by default with python 3.4 and above. You will need this to install any open-source python libraries.

To install a package using pip simply run the following.

pip install <package>#e.g.pip install pandas

Virtual environments

Different python projects will require different dependencies and versions of python. It is therefore important to have a way to create isolated and reproducible environments for each project. Virtual environments accomplish this.

There are a number of tools to create python virtual environments but I personally use pipenv .

Pipenv can be installed with homebrew.

brew install pipenv

To create a new environment using a specific version of python. Make a new directory and then run the following command from your new directory.

mkdir pip-test
cd pip-test
pipenv --python 3.7

To activate the environment run pipenv-shell you will now be in a new environment called ‘pip-test’.

If we inspect the contents of the directory you will see that pipenv has created a new file called Pipfile . This is the pipenv equivalent of a requirements file and contains all packages and versions that are used in the environment.

To install packages into the pipenv environment simply run the following.

pipenv install pandas

Any packages installed will be reflected in the pip file which means that the environment can be recreated easily using this file.

Jupyter Notebooks

Jupyter Notebooks are a web-based application for writing code. They are particularly suited to data science tasks because they enable you to render documentation, diagrams, tables and charts directly in line with your code. This creates a highly interactive and shareable platform for developing data science projects.

To install Jupyter notebooks simply run pip install notebook or if you are working in a pipenv shell pipenv install notebook .

As this is a web-based application you need to start the notebook server to begin writing your code. You do this by running this command.

jupyter notebook

This will open the application in your web browser, the default URL is http://127.0.0.1:8888 .

Setting Up Your Data Science Work Bench

The Notebook application running in the browser

Jupyter Notebooks are able to work with virtual environments so that you are able to run the notebooks for a project in the correct project environment. To make the pipenv environment available in the web application you need to run the following.

python -m ipykernel install --user --name myenv --display-name "Python (myenv)"#e.g.python -m ipykernel install --user --name jup-test --display-name "Python (jup-test)"

If you now restart the web application and got to new you will see your pipenv environment available. Selecting this will start a new notebook that will run with all the dependencies you have set up in your pipenv shell.

A notebook running in the Python(jup-test) environment

Python IDE

Jupyter Notebooks are very good for exploratory data science projects and for writing code that you will only use once. However, for efficiency, it is a good idea to write commonly used pieces of code into functions within modules that can be imported and used across projects (this is known as modularising your code).

Notebooks are not particularly well suited to writing modules. For this task, it is better to use an IDE (Integrated Development Environment). There are many available but I personally use Pycharm. The benefit of using IDE’s is that they contain tools such as Github integration and unit testing built-in.

Pycharm has both a paid professional version and a free community edition. To download and install Pycharm visit this website and follow the installation instructions.

Version control

One final tool you will want to use for your data science projects is Github. This is the most commonly used tool for version control. Version control essentially involves storing a version of your project online. Development is then performed locally from branches. Branches are essentially a copy of the project where changes can be made that will not affect the master version.

Once changes have been made locally you can push the changes to Github and they can be merged into the master branch in a controlled process known as a pull request.

Using Github will enable you to track changes to your project. You can also make changes and test the impact they have before integrating them into the final version. Github also enables collaboration with others as they can safely make changes without impacting the master branch.

To use Github you first need to install it which can be done by following these instructions . You will then need to visit the Github website and create an account.

Once you have an account you can create a new repository.

Setting Up Your Data Science Work Bench

Creating a new repository

To work on the project locally you will need to clone the repository.

cd my-directory
git clone https://github.com/rebeccavickery/my-repository.git

This article is a guide to setting up your computer ready to work on data science projects. Many of the tools I have listed are my personally preferred tools. However, in most cases, there are several alternatives. It is worth exploring the different options to find those best suited to your working style and projects.

Thanks for reading!

I send out a monthly newsletter if you would like to join please sign up via this link. Looking forward to being part of your learning journey!


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

CSS那些事儿

CSS那些事儿

林小志 / 电子工业出版社 / 2009-10 / 49.80元

本书专注于CSS技巧实例的讲解,由浅入深地分析了CSS样式在布局时所需要理解的原理。放弃到处可见的基础知识、网络中能随意搜索到的hack技巧,侧重原理分析,拓展读者使用CSS布局的思维方式,通过本书的阅读读者将会了解到使用CSS布局的强大功能。 全书以传达CSS布局思维为中心,通过页面中的文字、图片、表格、表单等常见元素的处理及各种页面布局方式的使用,使读者能深入了解到如何在页面中更好地运用......一起来看看 《CSS那些事儿》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具