So You Want to Write Your Own CSV Code? (2014)

栏目: IT技术 · 发布时间: 4年前

内容简介:So You Want To Write Your Own CSV code? Fields separated by commas and rows separated by newline. Easy right? You can write the code yourself in just a few lines.Hold on a second…You need to enclose the field with quotes (

So You Want To Write Your Own CSV code? Fields separated by commas and rows separated by newline. Easy right? You can write the code yourself in just a few lines.

Hold on a second…

What if there are commas inside the fields?

You need to enclose the field with quotes ( ). Easy right?

But can only some fields but not all be quoted?

What if there are quotes in the fields

You need to double each instance of quote in the field and god forbid you forget to enclose the field in quotes.

Also make sure not to mistake a quoted empty field ( ...,"",... ) for a double quote.

What if there is a newline inside a field?

Of course you must enclose the field using quotes.

What are the accepted newline characters?

CRLF? CR? LF? What if there are multiple newlines?

What if the newline characters change?

E.g.: newlines within a fields are different from newlines at the end of a line.

Still with me?

What if there is an extra comma at the end of a line?

Is there an empty field at the end or is that just a superfluous comma?

What if there is a variable amount of field per line?

What if there is an empty line?

Is that an EOF, a single empty field or no field at all?

What about whitespace?

What if there is heading/trailing whitespaces in the fields?

What if the CSV you get always has a space after a comma but it’s not part of the data?

What if the character separating fields is not a comma?

Not kidding.

Some countries use a comma as decimal separator instead of a colon. In those countries Excel will generate CSVs with semicolon as separator. Some files use tabs instead of comma to avoid this specific issue. Some even use non displayable ASCII characters .

Don’t forget to account for it when reading an arbitrary CSV file. No there’s no indication which delimiter a file uses.

What if the program reading CSV use multiple delimiters?

Some program (including Excel) will assume different delimiters when reading a file from the disk and reading it from the web. Make sure to give it the right one!

What if there is non ASCII data?

Just use utf8 right? But wait…

What if the program reading the CSV use an encoding depending on the locale?

A program can’t magically know what encoding a file is using. Some will use an encoding depending on the locale of the machine.

Meaning if you save a CSV on a machine and open it it another it may silently corrupt the data.

What if I put a BOM in my file?

After all Byte Order Masks can determine the unicode encoding used, that’s what they are for right? (actually they are used to determine the endianness but I won’t get into that).

If you include a BOM Excel will interpret the csv as a text file, not a CSV. This means breaks within lines are not handled.

Do you really still want to roll your own code to handle CSV?

CSV is not a well defined file-format. The RFC4180 does not represent reality. It seems as every program handles CSV in subtly different ways. Please do not inflict another one onto this world. Use a solid library.

If you have full control over the CSV provider and supplier and the data they emit you’ll be able to build a reliable automated system.

If a supplied CSV is arbitrary, the only real way to make sure the data is correct is for an user to check it and eventually specify the delimiter, quoting rule,… Barring that you may end up with a error or worse silently corrupted data.

Writing CSV code that works with files out there in the real world is a difficult task. The rabbit hole goes deep. Ruby CSV library is 2321 lines.

Discussion on Hacker News and Reddit .


以上所述就是小编给大家介绍的《So You Want to Write Your Own CSV Code? (2014)》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

区块链

区块链

(美)梅兰妮·斯万 / 新星出版社 / 2016-1-1 / 50元

本书以全景式的方式介绍了区块链相关技术目前发展状况和未来技术衍生方向的展望,作者认为区块链技术可能是继互联网发明以来最大的技术革命。全书从比特币的概念模型和区块链技术正开始结合的方面讨论了三个不同的结构层面:区块链1.0、2.0和3.0。首先介绍了比特币和区块链技术的基本定义和概念,还有作为区块链1.0应用核心的货币和支付系统。其次,区块链2.0将超越货币范畴,会发展为货币市场和金融应用,类似于合......一起来看看 《区块链》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具