So You Want to Write Your Own CSV Code? (2014)

栏目: IT技术 · 发布时间: 5年前

内容简介:So You Want To Write Your Own CSV code? Fields separated by commas and rows separated by newline. Easy right? You can write the code yourself in just a few lines.Hold on a second…You need to enclose the field with quotes (

So You Want To Write Your Own CSV code? Fields separated by commas and rows separated by newline. Easy right? You can write the code yourself in just a few lines.

Hold on a second…

What if there are commas inside the fields?

You need to enclose the field with quotes ( ). Easy right?

But can only some fields but not all be quoted?

What if there are quotes in the fields

You need to double each instance of quote in the field and god forbid you forget to enclose the field in quotes.

Also make sure not to mistake a quoted empty field ( ...,"",... ) for a double quote.

What if there is a newline inside a field?

Of course you must enclose the field using quotes.

What are the accepted newline characters?

CRLF? CR? LF? What if there are multiple newlines?

What if the newline characters change?

E.g.: newlines within a fields are different from newlines at the end of a line.

Still with me?

What if there is an extra comma at the end of a line?

Is there an empty field at the end or is that just a superfluous comma?

What if there is a variable amount of field per line?

What if there is an empty line?

Is that an EOF, a single empty field or no field at all?

What about whitespace?

What if there is heading/trailing whitespaces in the fields?

What if the CSV you get always has a space after a comma but it’s not part of the data?

What if the character separating fields is not a comma?

Not kidding.

Some countries use a comma as decimal separator instead of a colon. In those countries Excel will generate CSVs with semicolon as separator. Some files use tabs instead of comma to avoid this specific issue. Some even use non displayable ASCII characters .

Don’t forget to account for it when reading an arbitrary CSV file. No there’s no indication which delimiter a file uses.

What if the program reading CSV use multiple delimiters?

Some program (including Excel) will assume different delimiters when reading a file from the disk and reading it from the web. Make sure to give it the right one!

What if there is non ASCII data?

Just use utf8 right? But wait…

What if the program reading the CSV use an encoding depending on the locale?

A program can’t magically know what encoding a file is using. Some will use an encoding depending on the locale of the machine.

Meaning if you save a CSV on a machine and open it it another it may silently corrupt the data.

What if I put a BOM in my file?

After all Byte Order Masks can determine the unicode encoding used, that’s what they are for right? (actually they are used to determine the endianness but I won’t get into that).

If you include a BOM Excel will interpret the csv as a text file, not a CSV. This means breaks within lines are not handled.

Do you really still want to roll your own code to handle CSV?

CSV is not a well defined file-format. The RFC4180 does not represent reality. It seems as every program handles CSV in subtly different ways. Please do not inflict another one onto this world. Use a solid library.

If you have full control over the CSV provider and supplier and the data they emit you’ll be able to build a reliable automated system.

If a supplied CSV is arbitrary, the only real way to make sure the data is correct is for an user to check it and eventually specify the delimiter, quoting rule,… Barring that you may end up with a error or worse silently corrupted data.

Writing CSV code that works with files out there in the real world is a difficult task. The rabbit hole goes deep. Ruby CSV library is 2321 lines.

Discussion on Hacker News and Reddit .


以上所述就是小编给大家介绍的《So You Want to Write Your Own CSV Code? (2014)》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

慕课革命

慕课革命

汤敏 / 中信出版社 / 2015-1-1 / 39.00元

《慕课革命》,国内唯一一本关于全方面了解慕课的权威著作,全面阐述慕课理念与中国实践。 林毅夫、俞敏洪、徐小平、王强作序推。 大规模在线教育的慕课革命大幕已经拉开,这是一场基于互联网及移动互联网的教育大变革。根据网易教育联合有道发起的《2013中国在线教育新趋势调查报告》揭示,中国在线教育正呈现出六大趋势,包括互联网成为人们获取知识的最常见渠道;移动端学习方式已经开始成为人们接受的学习方......一起来看看 《慕课革命》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具