内容简介:A crucial part in becoming a researcher consists in building on top of research thatKeeping a list of all the publications that you read (or just collected) about a certain topic is the cornerstone of scholarship. Of course, writing papers is important, bu
A crucial part in becoming a researcher consists in building on top of research that others did. As the saying goes, ‘No Man is an Island’, and science typically advances by critiquing, extending, or—in some cases—tearing down the work of others. We are all standing on the shoulders of giants to some extent. When writing a publication, we acknowledge the work of others by adding appropriate citations . The entirety of these citations is typically referred to as a bibliography , but I will be using this term in a rather haphazard manner, extending its meaning to include ‘publications you read over the course of your Ph.D. research and beyond’.
Why?
Keeping a list of all the publications that you read (or just collected) about a certain topic is the cornerstone of scholarship. Of course, writing papers is important, but showing that you know how these papers fit into the grand scheme of things, is crucial in your journey to become a researcher. Plus, acknowledging the work of others is polite and humble—it demonstrates that, despite all the strides we make in research, we can only be successful if we work together and share our knowledge.
On the pragmatic level, establishing a bibliography will make it easier for you to summarise papers and, ultimately, generate ideas. It will also provide you with a sense of accomplishment—after all, reading papers is also a large part of research and should be acknowledged as such. At the end of your Ph.D., when writing your thesis, your main bibliography willcontain dozens of papers, which you should have at least skimmed at some point. Moreover, there will also be a handful of papers that you refer to so often that you know like the back of your hand. It is quite likely that, barring major changesin your research directions, your bibliography will keep on growing and benefiting you for many years to come. Hence, it pays off to do this the right way from the beginning.
How?
The tried-and-true way of bibliography management in mathematics and computer science (and, much to my delight, an increasing number of other disciplines) involves BibTeX , a companion tool to LaTeX. The basic idea is that you keep all your bibliographic entries in text files that follow the BibTeX format. In contrast to other methods of storage, this format has certain advantages:
- It can be read and edited by humans (with some training, but that is what you are here for!).
- It can be put under version control.
- It can be extended to contain more information, such as direct links to papers.
- It can be easily converted to other formats, making it highly versatile and flexible.
The great thing about this format is that there is a difference between the way you store your items (where the idea is that you add as much information as you can), and how they look in a document. For example, certain journals have their own styles for formatting a bibliography. BibTeX can easily accommodate them and leave out information or format it in a different way.
To get you started with writing BibTeX, grab your favourite text editor or a special BibTeX editorand create a new file. Every entry in the file follows the same format: you specify a type of the item that you want to add (such as an article in a journal or a book), followed by a description of its properties (such as authors or a title). An example entry could look like this:
@article{Edelsbrunner02, author = {Edelsbrunner, Herbert and Letscher, David and Zomorodian, Afra}, title = {Topological Persistence and Simplification}, journal = {Discrete {\&} Computational Geometry}, year = {2002}, volume = {28}, number = {4}, month = nov, publisher = {Springer}, pages = {511--533}, doi = {10.1007/s00454-002-2885-2}, }
Let us decompose this entry:
-
Edelsbrunner02
is an internal identifier. You use it whenever referring to that specific publication. For example, if you are citing the paper, you would write\cite{Edelsbrunner02}
. -
The
author
is the list of authors of the paper. Notice that I specified them based on their surnames first. This makes it easier for BibTeX to detect how an author name should be formatted. It pays off when you have names that are more complex such as Laurens van der Maaten of t-SNE fame . Formatting his name asvan der Maaten, Laurens
will tell BibTeX that everything before the comma is a surname. If your style abbreviates first names, Laurens will now be abbreviated asvan der Maaten, L.
, instead of some monstrosities likeMaaten, L.V.D.
or some such nonsense. -
The
title
refers to the title of the paper. I kept the capitalisation of the original paper intact, but whether upper-case or lower-case letters are used is at the discretion of the bibliography style that you use in practice. Hence, to ensure that proper nouns are capitalised correctly, you need to enclose them in curly braces. For example, a paper entitled Everything you wanted to know about Gaussian elimination should be formatted astitle = {Everything you wanted to know about {G}aussian elimination}
. You can also include whole words in curly braces in order to prevent BibTeX from changing them. This is great for things like t-SNE, which you can provide as{t-SNE}
in the title. -
The
journal
contains the name of the journal. I had to encode the ampersand ‘&’ in the title because LaTeX would complain otherwise. Again, the capitalisation of journal titles might be changed depending on the style, but it is good practice to use the original capitalisation of the journal. -
year
,volume
, andnumber
are all self-explanatory and contain, for once, no pitfalls. Thevolume
field refers to the time in history when journals would come in different volumes to collect articles within a certain period. Thenumber
refers to a more specific issue of the respective journal. It is thus more specific and should only be used in conjunction with thevolume
field, but never alone. As an example of this organisational style, the aforementioned journal Discrete & Computational Geometry assigns ‘Volume 63’ to all articles published from January to April 2020. The first issue of Volume 63 appeared in January. -
month
is a dangerous field. If you specify the month by a three-letter abbreviation like this (and without the curly braces), BibTeX can automatically provide the proper names in all kinds of languages. Ostensibly, this is a great feature if you are writing in multiple languages, but while I was initially very much in favour of always adding a month to my entries, it started losing its relevance over the past few years. I can in good conscience say that I never used themonth
field to look up information about a paper. There is one saving grace for it, though: the field is used internally for sorting ! Thus, if you have multiple articles by the same authors over the same year, themonth
field helps in establishing a consistent sorting order. -
publisher
is another one of these self-explanatory but ultimately dangerous fields. Thepublisher
string is almost never formatted directly by BibTeX, so make sure you are consistent with adding content there. Moreover, most bibliography styles ignore the publisher for an article anyway. I only included it here to describe it briefly. -
pages
is probably the most misused field. The idea is to specify a range of pages for the respective article. Hence, you need to use an ‘en-dash’, i.e. ‘–’, or ‘--
’ in LaTeX. You are not supposed to use spaces here or any other kind of dash. While page ranges can be seen as a charming remnant of the past, there are still some articles that are only available in the real world and have not been digitised yet. Thus, I keep on using this field even though I never used it to find an article (I did use it to find chapters in books, though, which is why I added the field to this example; there are situations in which the field is useful). -
doi
is one of my favourite fields. It refers to the Digital Object Identifier System and makes it possible to—finally—locate a specific bibliographic item with a single click by providing a persistent URL under which it can be reached. I always add DOIs to my bibliographies whenever I can get away with it (many venues use bibliography styles that discard them, though). For your own dissertation, I would definitely recommend them—modern BibTeX and BibLaTeX styles support them nicely and will format them as URLs that can be clicked. Make sure to just use the plain DOI in the field; there is no need for providing a URL directly.
If that seems like a lot of information for a single entry, do not despair—many other item types exist and they share the same set of elements. Let us discuss a few of them before providing more examples.
Common entry types
-
article
: you already encountered this entry type above. It is meant for research articles that have been published in journals. The required fields areauthor
,title
,journal
,year
,volume
. -
book
: use this to cite a book that was published somewhere. The required fields areauthor
/editor
(it is sufficient to specify one of them),title
,publisher
,year
. -
incollection
: use this to cite a part of a book that has its own title and authors. For example, my article Agreement Analysis of Quality Measures for Dimensionality Reduction was published in the book Topological Methods in Data Analysis and Visualization IV . If you do not want to cite the book as a whole but only my individual contribution, you should useincollection
. The required fields areauthor
,title
,booktitle
,publisher
, year. Thebooktitle
field should contain the title of the book, i.e. of the collection itself. As a rule, if a chapter has its own author, you probably want to useincollection
. -
inproceedings
: use this to cite an article that was published in conference proceedings. For example, papers published at ICML should generally be added as this type. The required fields areauthor
,title
,booktitle
,year
. The fieldbooktitle
is confusing. Here, it refers to the proceedings itself. For example, if you cite something from ICML 2019, you should use ‘Proceedings of the 36th International Conference on Machine Learning’ as its content. -
mastersthesis
andphdthesis
: use this to refer to a thesis. The required fields areauthor
,title
,school
,year
. The fieldschool
is a free-form field in the sense that you can provide the name of the institution. For example, to cite my Ph.D. thesis, you would useschool = {Ruprecht-Karls-Universit{\"a}t Heidelberg}
, since this is the German name of my university. -
techreport
: use this to cite a technical report, i.e. a report published by some institution that did not necessarily undergo peer-review (except for maybe an internal review). The required fields areauthor
,title
,institution
,year
. The fieldinstitution
refers to the university or other entity that published this report. You can also use this type to refer to other forms of grey literature , i.e. publications that do not fall under the traditional categories of academic publishing. A research report , for example, could also be considered atechreport
. -
misc
: use this as a last resort to add bibliographic information in almost free form. This type has no required fields, but a few optional ones, includingauthor
,title
, andnote
. You can use this to refer to companies or software projects, for example.
Optional fields
Having seen the most common entry types, you should be aware that most
of them support numerous optional
fields. For example, article
supports the optional pages
field. Wikipedia has a great breakdown of
optional and required fields for different entry
types
.
Whether to use all
of them or only the required ones is at your
discretion. I tend to take a pragmatic view here: you should add all
information that is necessary to identify
the work that was added to
your bibliography, as well as provide some context
about it. For
example, I prefer adding editor
fields to all entries whenever
appropriate; I see this as a professional courtesy towards the people
who edited a certain work. Whether I can use
all of these fields in
a bibliography for a paper is a different matter—remember that
curating a bibliography and using it in practice are two different
things; for most of your academic publications, a publisher or
conference will dictate how entries are formatted and which fields are
being included in them.
How to cite conference papers
After these theoretical examples, here are some practical considerations when adding machine learning articles to your bibliography.
-
ICML papers: download the appropriate BibTeX file from http://proceedings.mlr.press (for some reason, you have to click on ‘abs’ to get the abstract of a paper before links to BibTeX files are shown). Some adjustments are needed, though: remove the
address
fieldand themonth
field. Make sure that all names follow theLast, First
format. -
NeurIPS papers: download the appropriate BibTeX file from http://papers.neurips.cc . Change the entry type to
inproceedings
. The remainder of the file is fine, but be sure to specifyeditor
entries correctly; for some reason, the exported entries do not follow theLast, First
format. Moreover, for the 2019 proceedings, one of the editors is formatted incorrectly. Her name should be specified asd'Alch{\'e}-Buc, F.
; if you are using BibLaTeX, you can also directly specify the accent; it supports UTF-8. -
ICLR papers: download the appropriate BibTeX file from https://openreview.net . The format works well outside the box, and, having only a few entries, there is nothing you have to fix. Be mindful of the capitalisation rules, though!
For each of these conferences, consider adding their respective
abbreviation in the booktitle
field. Other than that, there is not
much you can do here. By the way: the instructions for ICML also apply
to a number of other venues, including AISTATS
, COLT
, and MLHC
!
It is great that PMLR
is providing this service.
How to cite preprints
I have skirted around the problem of citing arXiv preprints because there is no formal standard. There are, however, certain scenarios:
Scenario 1: BibLaTeX and your own bibliography style
This is the nicest scenario: you get to use your own bibliography style
and you are allowed to use BibLaTeX. In this case, use the misc
type
and the additional fields eprint
, archiveprefix
, and primaryclass
to format the entry. As an example, suppose you want to cite the arXiv
preprint PersLay: A Neural Network Layer for Persistence Diagrams and New Graph Topological Signatures
.
I would format it as follows:
@misc{Carriere19, author = {Carri{\`e}re, Mathieu and Chazal, Fr{\'e}d{\'e}ric and Ike, Yuichi and Lacombe, Th{\'e}o and Royer, Martin and Umeda, Yuhei}, title = {{P}ers{L}ay: A Neural Network Layer for Persistence Diagrams and New Graph Topological Signatures}, year = {2019}, eprint = {1904.09378}, archiveprefix = {arXiv}, primaryclass = {stat.ML}, }
You can see that eprint
contains the internal ID assigned by arXiv,
and archiveprefix
specifies that it is an arXiv
article. The primaryclass
field is helpful in declaring the main subject assignment
of the preprint but it is not necessary.
Scenario 2: BibTeX and a pre-defined bibliography style
In this case, to be on the safe side with most styles, I tend to use the article
type (which is wrong
because the journal
field is
required, so please consider this a workaround only). Hence, the
aforementioned preprint would be formatted like this:
@article{Carriere19, author = {Carri{\`e}re, Mathieu and Chazal, Fr{\'e}d{\'e}ric and Ike, Yuichi and Lacombe, Th{\'e}o and Royer, Martin and Umeda, Yuhei}, title = {{P}ers{L}ay: A Neural Network Layer for Persistence Diagrams and New Graph Topological Signatures}, year = {2019}, eprint = {1904.09378}, archiveprefix = {arXiv}, primaryclass = {stat.ML}, }
Some bibliography styles, such as the one used by ICML, are incapable
of formatting such an entry correctly. In this case, I add the field pages
with incorrect information:
pages = {arXiv:1904.09378}
You should only
do this if you are forced to use a bibliography style
that does not support arXiv preprints otherwise! In all other cases,
consider using misc
and providing information about the eprint
etc.
As a ‘milder’ form of formatting the entry, you could also add a ‘fake’
journal by setting journal = {arXiv e-prints}
or journal = {arXiv preprint}
. This is sometimes suggested when you export a citation from
arXiv. I cannot say that I love this practice, but it works reasonably
well.
Scenario 3: BibTeX and your own bibliography style
First of all, consider using BibLaTeX as package in your documents; it
will make formatting your bibliography much easier. If you do not want
to make the switch, I would first stick with misc
type as described
above. If that does not work, use article
or, in the worst case, the unpublished
type.
Common pitfalls
Having now discussed at length how to keep entries in a bibliography, I want to close this post with a list of common pitfalls and how to avoid them:
-
Double-check all
.bib
files that you download. Publishers are notorious for incorrectly-formatted files. While they might work, you might introduce problems in your bibliography that are hard to find later on. -
Always check author names and reformat them, if necessary. A full name with an initial is best stored as
Riker, William T.
as it permits BibTeX to abbreviate it asW.T. Riker
. It is a common mistake to provide abbreviated names already in the file, such asRiker, WT
. This will be formatted asW. Riker
. Hence, if you only have initials available, it is best so separate them by periods. -
Check for duplicated entries, in particular for files downloaded from somewhere else.
-
Choose the right entry type as outlined above. People often use
inbook
when they actually meanincollection
. The former is almost always unnecessary (at least in machine learning, where we only tend to cite publications that can be assigned to one or more individuals). -
Remove superfluous information from all items. Only keep the things that are required to uniquely identify a publication and put it into context. For many modern publications, there is no need to keep an ISSN, for example.
-
Use DOIs whenever you can. Remember that not every field in a bibliographic item needs to be shown—but having a DOI makes it easier for you to track down an article later on.
-
Be consistent with journal titles, abbreviations, and the like.
-
Check the capitalisation of your entries. Do not fiddle too much with curly braces (some people suggest putting the whole title in curly braces, but this essentially removes all options for reformatting later on).
If that list has not worn you out, there is also a great discussion of more common mistakes , courtesy of TeX StackExchange .
The next steps
By now you should be familiar with the basic rules in keeping a bibliography using BibTeX. When you start curating your own entries, strive for consistency and correctness . This will make your life much easier and permit you to use bibliographic entries efficiently.
If you want to learn more, I would suggest reading Tame the BeaST , which discusses many details and provides the rationale behind certain choices in BibTeX. Moreover, you should consider using BibLaTeX whenever you can—it makes formatting your bibliography so much easier. Finally, if you want to see BibLaTeX in action, you might want to take a look at latex-mimosis , my document class providing a minimal and modern LaTeX template for all your thesis needs?
Happy bibliography management, until next time!
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
编程珠玑(第2版•修订版)
[美] Jon Bentley 乔恩•本特利 / 黄倩、钱丽艳 / 人民邮电出版社 / 2014-12 / 39
历史上最伟大的计算机科学著作之一 融深邃思想、实战技术与趣味轶事于一炉的奇书 带你真正领略计算机科学之美 多年以来,当程序员们推选出最心爱的计算机图书时,《编程珠玑》总是位于前列。正如自然界里珍珠出自细沙对牡蛎的磨砺,计算机科学大师Jon Bentley以其独有的洞察力和创造力,从磨砺程序员的实际问题中凝结出一篇篇不朽的编程“珠玑”,成为世界计算机界名刊《ACM通讯》历史上最受欢......一起来看看 《编程珠玑(第2版•修订版)》 这本书的介绍吧!