内容简介:Version control is all about managing changes to files and directories by one or many contributors. Git is an incredibly popular system for version control and the one we will be running through for this course.There are many benefits to version control, a
Introduction to Version Control
Version control is all about managing changes to files and directories by one or many contributors. Git is an incredibly popular system for version control and the one we will be running through for this course.
There are many benefits to version control, and Git specifically. Including a view of historical changes made to your project, automatic notification of conflicting work, where two individuals effectively write conflicting lines of code, allows for collaboration across many individuals which allows teams to grow.
Version control is a staple to software engineering and something that is slowly being adopted across data science teams where data scientists often work in silos with their own technologies and workflows. While a degree of autonomy is critical, the ability for data scientists to collaborate in high coordination with one another has tremendous difficulty scaling without some process around version control.
What is a Repository?
You’ve likely heard the term many times… repository.
All of your data science projects managed with Git will have two main components. The first is all of the work you’re doing in association with files and directories.. your scripts, models, and where and how their stored; the other piece of this is the information that Git holds onto to maintain a record of all of the changes that have been made to your project over time.
When you add those pieces together, you have yourself a repository, or as the cool kids call it… a repo ;)
Basic Commands For You to Know
Git status
Git status lets you know what is in the “staging area”
The staging area is where you put the files that you will be changing. It’s effectively you prepping a variety of letters and putting them in a box ready to send. Whether you want to remove things from here or add more is up to you, but the moment you hand them to the mailman there’s no getting them back. Those changes will take place. Git status will give you information about whatever file(s) are in the box ready to go to the main.
Git add
If you ran git status
and found there was nothing in your staging area not to worry! You first need to add files to the staging area. You can do so with git add filename
. Whatever filename you add here will be moved to the staging area. That means that all of the changes residing in a given file would be ready to push or be updated in the repo.
Git diff
Now we can see what file is in the staging area with git status
, but what about the event where you want to see what has changed? You can use what's called git diff
. Git diff
will return all of the differences between the original file and all of the changes to be made, denoting them as a and b respectively.
When running git diff
, you might actually run git diff -r HEAD
. HEAD
will give you the most recent commit, and -r
will make a comparison to a specific version of the file. If you want to see the changes of one file in particular, you can include the file path after HEAD
. Something to the effect of git diff -r HEAD filepath
Git commit
Once you’ve added files to your staging area, you can put them in the mailbox with git commit
. Keep in mind that anything in the 'box' gets shipped together as one unit. So if you want to undo anything about a given commit, you would have to roll back the entire commit.
A good best practice is to commit with good frequency.
One thing to keep in mind is you won’t actually just run git commit
. Your command will actually look like this git commit -m "model updates"
. This -m
is your log message. A best practice here is to be specific and descriptive about the changes you've made to your project. You'll thank yourself later!
Git log
Now the last command I’ll talk about for now is git log
/
git log
is where you can pull up your repository's history of commits. It provides a handful of pieces of information like the author, commit date, and log message.
Conclusion
I hope this proves a useful crash-course on git! Git your hands dirty with those commands to get yourself and your data science teams using Git more effectively!
Happy data science-ing!
以上所述就是小编给大家介绍的《GIT Essentials for Every Data Scientist》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
产品心经:产品经理应该知道的60件事(第2版)
闫荣 / 机械工业出版社 / 2016-4 / 69.00
本书第一版出版后广获好评,应广大读者要求,作者把自己在实践中新近总结的10个关于产品的最佳实践融入到了这本新书中。这"10件事"侧重于深挖产品需求和产品疯传背后的秘密,配合之前的"50件事",不仅能帮产品经理打造出让用户尖叫并疯传的产品,还能帮助产品经理迅速全方位提升自己的能力。 本书作者有超过10年的产品工作经验,在互联网产品领域公认的大咖,这本书从产品经理核心素养、产品认知、战略与规划、......一起来看看 《产品心经:产品经理应该知道的60件事(第2版)》 这本书的介绍吧!