GIT Essentials for Every Data Scientist

栏目: IT技术 · 发布时间: 4年前

内容简介:Version control is all about managing changes to files and directories by one or many contributors. Git is an incredibly popular system for version control and the one we will be running through for this course.There are many benefits to version control, a

GIT Essentials for Every Data Scientist

Image by RegalShave from Pixabay

Introduction to Version Control

Version control is all about managing changes to files and directories by one or many contributors. Git is an incredibly popular system for version control and the one we will be running through for this course.

There are many benefits to version control, and Git specifically. Including a view of historical changes made to your project, automatic notification of conflicting work, where two individuals effectively write conflicting lines of code, allows for collaboration across many individuals which allows teams to grow.

Version control is a staple to software engineering and something that is slowly being adopted across data science teams where data scientists often work in silos with their own technologies and workflows. While a degree of autonomy is critical, the ability for data scientists to collaborate in high coordination with one another has tremendous difficulty scaling without some process around version control.

What is a Repository?

You’ve likely heard the term many times… repository.

All of your data science projects managed with Git will have two main components. The first is all of the work you’re doing in association with files and directories.. your scripts, models, and where and how their stored; the other piece of this is the information that Git holds onto to maintain a record of all of the changes that have been made to your project over time.

When you add those pieces together, you have yourself a repository, or as the cool kids call it… a repo ;)

Basic Commands For You to Know

Git status

Git status lets you know what is in the “staging area”

The staging area is where you put the files that you will be changing. It’s effectively you prepping a variety of letters and putting them in a box ready to send. Whether you want to remove things from here or add more is up to you, but the moment you hand them to the mailman there’s no getting them back. Those changes will take place. Git status will give you information about whatever file(s) are in the box ready to go to the main.

Git add

If you ran git status and found there was nothing in your staging area not to worry! You first need to add files to the staging area. You can do so with git add filename . Whatever filename you add here will be moved to the staging area. That means that all of the changes residing in a given file would be ready to push or be updated in the repo.

Git diff

Now we can see what file is in the staging area with git status , but what about the event where you want to see what has changed? You can use what's called git diff . Git diff will return all of the differences between the original file and all of the changes to be made, denoting them as a and b respectively.

When running git diff , you might actually run git diff -r HEAD . HEAD will give you the most recent commit, and -r will make a comparison to a specific version of the file. If you want to see the changes of one file in particular, you can include the file path after HEAD . Something to the effect of git diff -r HEAD filepath

Git commit

Once you’ve added files to your staging area, you can put them in the mailbox with git commit . Keep in mind that anything in the 'box' gets shipped together as one unit. So if you want to undo anything about a given commit, you would have to roll back the entire commit.

A good best practice is to commit with good frequency.

One thing to keep in mind is you won’t actually just run git commit . Your command will actually look like this git commit -m "model updates" . This -m is your log message. A best practice here is to be specific and descriptive about the changes you've made to your project. You'll thank yourself later!

Git log

Now the last command I’ll talk about for now is git log /

git log is where you can pull up your repository's history of commits. It provides a handful of pieces of information like the author, commit date, and log message.

Conclusion

I hope this proves a useful crash-course on git! Git your hands dirty with those commands to get yourself and your data science teams using Git more effectively!

Happy data science-ing!


以上所述就是小编给大家介绍的《GIT Essentials for Every Data Scientist》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Practical Algorithms for Programmers

Practical Algorithms for Programmers

Andrew Binstock、John Rex / Addison-Wesley Professional / 1995-06-29 / USD 39.99

Most algorithm books today are either academic textbooks or rehashes of the same tired set of algorithms. Practical Algorithms for Programmers is the first book to give complete code implementations o......一起来看看 《Practical Algorithms for Programmers》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具