So You Want to Write Your Own CSV Code? (2014)

栏目: IT技术 · 发布时间: 4年前

内容简介:So You Want To Write Your Own CSV code? Fields separated by commas and rows separated by newline. Easy right? You can write the code yourself in just a few lines.Hold on a second…You need to enclose the field with quotes (

So You Want To Write Your Own CSV code? Fields separated by commas and rows separated by newline. Easy right? You can write the code yourself in just a few lines.

Hold on a second…

What if there are commas inside the fields?

You need to enclose the field with quotes ( ). Easy right?

But can only some fields but not all be quoted?

What if there are quotes in the fields

You need to double each instance of quote in the field and god forbid you forget to enclose the field in quotes.

Also make sure not to mistake a quoted empty field ( ...,"",... ) for a double quote.

What if there is a newline inside a field?

Of course you must enclose the field using quotes.

What are the accepted newline characters?

CRLF? CR? LF? What if there are multiple newlines?

What if the newline characters change?

E.g.: newlines within a fields are different from newlines at the end of a line.

Still with me?

What if there is an extra comma at the end of a line?

Is there an empty field at the end or is that just a superfluous comma?

What if there is a variable amount of field per line?

What if there is an empty line?

Is that an EOF, a single empty field or no field at all?

What about whitespace?

What if there is heading/trailing whitespaces in the fields?

What if the CSV you get always has a space after a comma but it’s not part of the data?

What if the character separating fields is not a comma?

Not kidding.

Some countries use a comma as decimal separator instead of a colon. In those countries Excel will generate CSVs with semicolon as separator. Some files use tabs instead of comma to avoid this specific issue. Some even use non displayable ASCII characters .

Don’t forget to account for it when reading an arbitrary CSV file. No there’s no indication which delimiter a file uses.

What if the program reading CSV use multiple delimiters?

Some program (including Excel) will assume different delimiters when reading a file from the disk and reading it from the web. Make sure to give it the right one!

What if there is non ASCII data?

Just use utf8 right? But wait…

What if the program reading the CSV use an encoding depending on the locale?

A program can’t magically know what encoding a file is using. Some will use an encoding depending on the locale of the machine.

Meaning if you save a CSV on a machine and open it it another it may silently corrupt the data.

What if I put a BOM in my file?

After all Byte Order Masks can determine the unicode encoding used, that’s what they are for right? (actually they are used to determine the endianness but I won’t get into that).

If you include a BOM Excel will interpret the csv as a text file, not a CSV. This means breaks within lines are not handled.

Do you really still want to roll your own code to handle CSV?

CSV is not a well defined file-format. The RFC4180 does not represent reality. It seems as every program handles CSV in subtly different ways. Please do not inflict another one onto this world. Use a solid library.

If you have full control over the CSV provider and supplier and the data they emit you’ll be able to build a reliable automated system.

If a supplied CSV is arbitrary, the only real way to make sure the data is correct is for an user to check it and eventually specify the delimiter, quoting rule,… Barring that you may end up with a error or worse silently corrupted data.

Writing CSV code that works with files out there in the real world is a difficult task. The rabbit hole goes deep. Ruby CSV library is 2321 lines.

Discussion on Hacker News and Reddit .


以上所述就是小编给大家介绍的《So You Want to Write Your Own CSV Code? (2014)》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

激荡十年,水大鱼大

激荡十年,水大鱼大

吴晓波 / 中信出版社 / 2017-11-1 / CNY 58.00

【编辑推荐】 知名财经作者吴晓波新作,畅销十年、销量超过两百万册的《激荡三十年》续篇,至此完成改革开放四十年企业史完整记录。 作为时代记录者,吴晓波有意识地从1978年中国改革开放伊始,记录中国翻天覆地的变化和对我们影响至深的人物与事件,串成一部我们每个人的时代激荡史。而最新的这十年,无疑更壮观,也更扑朔迷离。 很多事情,在当时并未有很深很透的感受,回过头来再看,可能命运的轨迹就......一起来看看 《激荡十年,水大鱼大》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具