If you think CSV is easy; think again!
Posted by jpluimers on 2012/12/05
Lots of people think CSV is easy: it’s just a bunch of values separated with commas. But in practice it is not. Various reasons can make CSV very hard, especially since “CSV” is not a single, well-defined format. As always importing is always harder than exporting. A few reasons that make it hard:
- Comma is often not the separator
- The separator can be inside a field as well, so you need some form of quoting the separator
- If quotes are in the fields, you need to have some form of escaping these quotes (usually done by doubling the double quotes or doubling the single quotes or quoting commas and newlines)
- What kind of quotes to you want (especially when you want to embed the CSV into an XML or HTML attribute)
- Do you allow for newlines (and if yes: what kind of newline representation: CRLF, LF, CR, LFCR?) Some solve this by replacing newline representations with spaces, but that is not always a good idea.
- Encoding? What encoding: everything is EBCDIC, right?
A few links that helped me a lot getting input and output of CSV right in C#:
- c# – Creating a DataTable from CSV File – Stack Overflow.
- JoshClose/CsvHelper · GitHub.
- CSV TextfieldParser parsing through lines in CSV File with Single and Double Quotes, while skipping lines of unwanted data.
Though be careful with TextFieldParser problem with Double Quotes with Quotes. - c# – Creating a comma separated list from IList or IEnumerable – Stack Overflow.
(simple solution when you know no strange characters are involved). - c# – How to split csv whose columns may contain , – Stack Overflow.
- A Fast CSV Reader – CodeProject.
Thanks to Jabulaza:
- julian m bucknall >> Writing a parser for CSV data.
- PCPlus 258: Parsing comma-separated values : Algorithms for the masses – julian m bucknall.
–jeroen
via: Comma-separated values – Wikipedia, the free encyclopedia.
CSV libraries for .NET « The Wiert Corner – irregular stream of stuff said
[…] with the links from my previous CSV post If you think CSV is easy; think again that should get everyone […]
Jørn E. Angeltveit said
And of course the format of the various data types. Decimal/comma separator and date/time format.
jabulaza said
Another two csv resources I found very insightful would be: http://www.boyet.com/articles/csvparser.html
http://blog.boyet.com/blog/pcplus/pcplus-258-parsing-comma-separated-values/
both from Julian M. Bucknall.
jpluimers said
Thanks for that info. Really insightful indeed!