Keeping a Bibliography

栏目: IT技术 · 发布时间: 4年前

内容简介:A crucial part in becoming a researcher consists in building on top of research thatKeeping a list of all the publications that you read (or just collected) about a certain topic is the cornerstone of scholarship. Of course, writing papers is important, bu

A crucial part in becoming a researcher consists in building on top of research that others did. As the saying goes, ‘No Man is an Island’, and science typically advances by critiquing, extending, or—in some cases—tearing down the work of others. We are all standing on the shoulders of giants to some extent. When writing a publication, we acknowledge the work of others by adding appropriate citations . The entirety of these citations is typically referred to as a bibliography , but I will be using this term in a rather haphazard manner, extending its meaning to include ‘publications you read over the course of your Ph.D. research and beyond’.

Why?

Keeping a list of all the publications that you read (or just collected) about a certain topic is the cornerstone of scholarship. Of course, writing papers is important, but showing that you know how these papers fit into the grand scheme of things, is crucial in your journey to become a researcher. Plus, acknowledging the work of others is polite and humble—it demonstrates that, despite all the strides we make in research, we can only be successful if we work together and share our knowledge.

On the pragmatic level, establishing a bibliography will make it easier for you to summarise papers and, ultimately, generate ideas. It will also provide you with a sense of accomplishment—after all, reading papers is also a large part of research and should be acknowledged as such. At the end of your Ph.D., when writing your thesis, your main bibliography willcontain dozens of papers, which you should have at least skimmed at some point. Moreover, there will also be a handful of papers that you refer to so often that you know like the back of your hand. It is quite likely that, barring major changesin your research directions, your bibliography will keep on growing and benefiting you for many years to come. Hence, it pays off to do this the right way from the beginning.

How?

The tried-and-true way of bibliography management in mathematics and computer science (and, much to my delight, an increasing number of other disciplines) involves BibTeX , a companion tool to LaTeX. The basic idea is that you keep all your bibliographic entries in text files that follow the BibTeX format. In contrast to other methods of storage, this format has certain advantages:

  1. It can be read and edited by humans (with some training, but that is what you are here for!).
  2. It can be put under version control.
  3. It can be extended to contain more information, such as direct links to papers.
  4. It can be easily converted to other formats, making it highly versatile and flexible.

The great thing about this format is that there is a difference between the way you store your items (where the idea is that you add as much information as you can), and how they look in a document. For example, certain journals have their own styles for formatting a bibliography. BibTeX can easily accommodate them and leave out information or format it in a different way.

To get you started with writing BibTeX, grab your favourite text editor or a special BibTeX editorand create a new file. Every entry in the file follows the same format: you specify a type of the item that you want to add (such as an article in a journal or a book), followed by a description of its properties (such as authors or a title). An example entry could look like this:

@article{Edelsbrunner02,
  author    = {Edelsbrunner, Herbert and Letscher, David and Zomorodian, Afra},
  title     = {Topological Persistence and Simplification},
  journal   = {Discrete {\&} Computational Geometry},
  year      = {2002},
  volume    = {28},
  number    = {4},
  month     = nov,
  publisher = {Springer},
  pages     = {511--533},
  doi       = {10.1007/s00454-002-2885-2},
}

Let us decompose this entry:

  • Edelsbrunner02 is an internal identifier. You use it whenever referring to that specific publication. For example, if you are citing the paper, you would write \cite{Edelsbrunner02} .

  • The author is the list of authors of the paper. Notice that I specified them based on their surnames first. This makes it easier for BibTeX to detect how an author name should be formatted. It pays off when you have names that are more complex such as Laurens van der Maaten of t-SNE fame . Formatting his name as van der Maaten, Laurens will tell BibTeX that everything before the comma is a surname. If your style abbreviates first names, Laurens will now be abbreviated as van der Maaten, L. , instead of some monstrosities like Maaten, L.V.D. or some such nonsense.

  • The title refers to the title of the paper. I kept the capitalisation of the original paper intact, but whether upper-case or lower-case letters are used is at the discretion of the bibliography style that you use in practice. Hence, to ensure that proper nouns are capitalised correctly, you need to enclose them in curly braces. For example, a paper entitled Everything you wanted to know about Gaussian elimination should be formatted as title = {Everything you wanted to know about {G}aussian elimination} . You can also include whole words in curly braces in order to prevent BibTeX from changing them. This is great for things like t-SNE, which you can provide as {t-SNE} in the title.

  • The journal contains the name of the journal. I had to encode the ampersand ‘&’ in the title because LaTeX would complain otherwise. Again, the capitalisation of journal titles might be changed depending on the style, but it is good practice to use the original capitalisation of the journal.

  • year , volume , and number are all self-explanatory and contain, for once, no pitfalls. The volume field refers to the time in history when journals would come in different volumes to collect articles within a certain period. The number refers to a more specific issue of the respective journal. It is thus more specific and should only be used in conjunction with the volume field, but never alone. As an example of this organisational style, the aforementioned journal Discrete & Computational Geometry assigns ‘Volume 63’ to all articles published from January to April 2020. The first issue of Volume 63 appeared in January.

  • month is a dangerous field. If you specify the month by a three-letter abbreviation like this (and without the curly braces), BibTeX can automatically provide the proper names in all kinds of languages. Ostensibly, this is a great feature if you are writing in multiple languages, but while I was initially very much in favour of always adding a month to my entries, it started losing its relevance over the past few years. I can in good conscience say that I never used the month field to look up information about a paper. There is one saving grace for it, though: the field is used internally for sorting ! Thus, if you have multiple articles by the same authors over the same year, the month field helps in establishing a consistent sorting order.

  • publisher is another one of these self-explanatory but ultimately dangerous fields. The publisher string is almost never formatted directly by BibTeX, so make sure you are consistent with adding content there. Moreover, most bibliography styles ignore the publisher for an article anyway. I only included it here to describe it briefly.

  • pages is probably the most misused field. The idea is to specify a range of pages for the respective article. Hence, you need to use an ‘en-dash’, i.e. ‘–’, or ‘ -- ’ in LaTeX. You are not supposed to use spaces here or any other kind of dash. While page ranges can be seen as a charming remnant of the past, there are still some articles that are only available in the real world and have not been digitised yet. Thus, I keep on using this field even though I never used it to find an article (I did use it to find chapters in books, though, which is why I added the field to this example; there are situations in which the field is useful).

  • doi is one of my favourite fields. It refers to the Digital Object Identifier System and makes it possible to—finally—locate a specific bibliographic item with a single click by providing a persistent URL under which it can be reached. I always add DOIs to my bibliographies whenever I can get away with it (many venues use bibliography styles that discard them, though). For your own dissertation, I would definitely recommend them—modern BibTeX and BibLaTeX styles support them nicely and will format them as URLs that can be clicked. Make sure to just use the plain DOI in the field; there is no need for providing a URL directly.

If that seems like a lot of information for a single entry, do not despair—many other item types exist and they share the same set of elements. Let us discuss a few of them before providing more examples.

Common entry types

  • article : you already encountered this entry type above. It is meant for research articles that have been published in journals. The required fields are author , title , journal , year , volume .

  • book : use this to cite a book that was published somewhere. The required fields are author / editor (it is sufficient to specify one of them), title , publisher , year .

  • incollection : use this to cite a part of a book that has its own title and authors. For example, my article Agreement Analysis of Quality Measures for Dimensionality Reduction was published in the book Topological Methods in Data Analysis and Visualization IV . If you do not want to cite the book as a whole but only my individual contribution, you should use incollection . The required fields are author , title , booktitle , publisher , year. The booktitle field should contain the title of the book, i.e. of the collection itself. As a rule, if a chapter has its own author, you probably want to use incollection .

  • inproceedings : use this to cite an article that was published in conference proceedings. For example, papers published at ICML should generally be added as this type. The required fields are author , title , booktitle , year . The field booktitle is confusing. Here, it refers to the proceedings itself. For example, if you cite something from ICML 2019, you should use ‘Proceedings of the 36th International Conference on Machine Learning’ as its content.

  • mastersthesis and phdthesis : use this to refer to a thesis. The required fields are author , title , school , year . The field school is a free-form field in the sense that you can provide the name of the institution. For example, to cite my Ph.D. thesis, you would use school = {Ruprecht-Karls-Universit{\"a}t Heidelberg} , since this is the German name of my university.

  • techreport : use this to cite a technical report, i.e. a report published by some institution that did not necessarily undergo peer-review (except for maybe an internal review). The required fields are author , title , institution , year . The field institution refers to the university or other entity that published this report. You can also use this type to refer to other forms of grey literature , i.e. publications that do not fall under the traditional categories of academic publishing. A research report , for example, could also be considered a techreport .

  • misc : use this as a last resort to add bibliographic information in almost free form. This type has no required fields, but a few optional ones, including author , title , and note . You can use this to refer to companies or software projects, for example.

Optional fields

Having seen the most common entry types, you should be aware that most of them support numerous optional fields. For example, article supports the optional pages field. Wikipedia has a great breakdown of optional and required fields for different entry types .

Whether to use all of them or only the required ones is at your discretion. I tend to take a pragmatic view here: you should add all information that is necessary to identify the work that was added to your bibliography, as well as provide some context about it. For example, I prefer adding editor fields to all entries whenever appropriate; I see this as a professional courtesy towards the people who edited a certain work. Whether I can use all of these fields in a bibliography for a paper is a different matter—remember that curating a bibliography and using it in practice are two different things; for most of your academic publications, a publisher or conference will dictate how entries are formatted and which fields are being included in them.

How to cite conference papers

After these theoretical examples, here are some practical considerations when adding machine learning articles to your bibliography.

  • ICML papers: download the appropriate BibTeX file from http://proceedings.mlr.press (for some reason, you have to click on ‘abs’ to get the abstract of a paper before links to BibTeX files are shown). Some adjustments are needed, though: remove the address fieldand the month field. Make sure that all names follow the Last, First format.

  • NeurIPS papers: download the appropriate BibTeX file from http://papers.neurips.cc . Change the entry type to inproceedings . The remainder of the file is fine, but be sure to specify editor entries correctly; for some reason, the exported entries do not follow the Last, First format. Moreover, for the 2019 proceedings, one of the editors is formatted incorrectly. Her name should be specified as d'Alch{\'e}-Buc, F. ; if you are using BibLaTeX, you can also directly specify the accent; it supports UTF-8.

  • ICLR papers: download the appropriate BibTeX file from https://openreview.net . The format works well outside the box, and, having only a few entries, there is nothing you have to fix. Be mindful of the capitalisation rules, though!

For each of these conferences, consider adding their respective abbreviation in the booktitle field. Other than that, there is not much you can do here. By the way: the instructions for ICML also apply to a number of other venues, including AISTATS , COLT , and MLHC ! It is great that PMLR is providing this service.

How to cite preprints

I have skirted around the problem of citing arXiv preprints because there is no formal standard. There are, however, certain scenarios:

Scenario 1: BibLaTeX and your own bibliography style

This is the nicest scenario: you get to use your own bibliography style and you are allowed to use BibLaTeX. In this case, use the misc type and the additional fields eprint , archiveprefix , and primaryclass to format the entry. As an example, suppose you want to cite the arXiv preprint PersLay: A Neural Network Layer for Persistence Diagrams and New Graph Topological Signatures . I would format it as follows:

@misc{Carriere19,
  author        = {Carri{\`e}re, Mathieu and Chazal, Fr{\'e}d{\'e}ric and Ike, Yuichi and Lacombe, Th{\'e}o and Royer, Martin and Umeda, Yuhei},
  title         = {{P}ers{L}ay: A Neural Network Layer for Persistence Diagrams and New Graph Topological Signatures},
  year          = {2019},
  eprint        = {1904.09378},
  archiveprefix = {arXiv},
  primaryclass  = {stat.ML},
}

You can see that eprint contains the internal ID assigned by arXiv, and archiveprefix specifies that it is an arXiv article. The primaryclass field is helpful in declaring the main subject assignment of the preprint but it is not necessary.

Scenario 2: BibTeX and a pre-defined bibliography style

In this case, to be on the safe side with most styles, I tend to use the article type (which is wrong because the journal field is required, so please consider this a workaround only). Hence, the aforementioned preprint would be formatted like this:

@article{Carriere19,
  author        = {Carri{\`e}re, Mathieu and Chazal, Fr{\'e}d{\'e}ric and Ike, Yuichi and Lacombe, Th{\'e}o and Royer, Martin and Umeda, Yuhei},
  title         = {{P}ers{L}ay: A Neural Network Layer for Persistence Diagrams and New Graph Topological Signatures},
  year          = {2019},
  eprint        = {1904.09378},
  archiveprefix = {arXiv},
  primaryclass  = {stat.ML},
}

Some bibliography styles, such as the one used by ICML, are incapable of formatting such an entry correctly. In this case, I add the field pages with incorrect information:

pages = {arXiv:1904.09378}

You should only do this if you are forced to use a bibliography style that does not support arXiv preprints otherwise! In all other cases, consider using misc and providing information about the eprint etc. As a ‘milder’ form of formatting the entry, you could also add a ‘fake’ journal by setting journal = {arXiv e-prints} or journal = {arXiv preprint} . This is sometimes suggested when you export a citation from arXiv. I cannot say that I love this practice, but it works reasonably well.

Scenario 3: BibTeX and your own bibliography style

First of all, consider using BibLaTeX as package in your documents; it will make formatting your bibliography much easier. If you do not want to make the switch, I would first stick with misc type as described above. If that does not work, use article or, in the worst case, the unpublished type.

Common pitfalls

Having now discussed at length how to keep entries in a bibliography, I want to close this post with a list of common pitfalls and how to avoid them:

  1. Double-check all .bib files that you download. Publishers are notorious for incorrectly-formatted files. While they might work, you might introduce problems in your bibliography that are hard to find later on.

  2. Always check author names and reformat them, if necessary. A full name with an initial is best stored as Riker, William T. as it permits BibTeX to abbreviate it as W.T. Riker . It is a common mistake to provide abbreviated names already in the file, such as Riker, WT . This will be formatted as W. Riker . Hence, if you only have initials available, it is best so separate them by periods.

  3. Check for duplicated entries, in particular for files downloaded from somewhere else.

  4. Choose the right entry type as outlined above. People often use inbook when they actually mean incollection . The former is almost always unnecessary (at least in machine learning, where we only tend to cite publications that can be assigned to one or more individuals).

  5. Remove superfluous information from all items. Only keep the things that are required to uniquely identify a publication and put it into context. For many modern publications, there is no need to keep an ISSN, for example.

  6. Use DOIs whenever you can. Remember that not every field in a bibliographic item needs to be shown—but having a DOI makes it easier for you to track down an article later on.

  7. Be consistent with journal titles, abbreviations, and the like.

  8. Check the capitalisation of your entries. Do not fiddle too much with curly braces (some people suggest putting the whole title in curly braces, but this essentially removes all options for reformatting later on).

If that list has not worn you out, there is also a great discussion of more common mistakes , courtesy of TeX StackExchange .

The next steps

By now you should be familiar with the basic rules in keeping a bibliography using BibTeX. When you start curating your own entries, strive for consistency and correctness . This will make your life much easier and permit you to use bibliographic entries efficiently.

If you want to learn more, I would suggest reading Tame the BeaST , which discusses many details and provides the rationale behind certain choices in BibTeX. Moreover, you should consider using BibLaTeX whenever you can—it makes formatting your bibliography so much easier. Finally, if you want to see BibLaTeX in action, you might want to take a look at latex-mimosis , my document class providing a minimal and modern LaTeX template for all your thesis needs?

Happy bibliography management, until next time!


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

零基础学PHP

零基础学PHP

马忠超 / 2008-3 / 56.00元

《零基础学PHP》主要内容:PHP是一种运行于服务器端并完全跨平台的嵌入式脚本编程语言,是目前开发各类Web应用的主流语言之一。PHP因其功能强大、易学易用、可扩展性强、运行速度快和良好的开放性,而成为网站开发者的首选工具,其较高的开发效率,也给开发人员在编写Web应用程序时带来极大的便利。一起来看看 《零基础学PHP》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码