When working with Microsoft Excel 2011 on Mac OS X, it will export spreadsheets to comma separated .csv files or
tab delimited .txt plain text files using the old Mac OS 9 line endings of a carriage return (\r
).
When viewed with less
or vi
, these characters appear as ^M
, and the text file appears as one massively long line.
These end of line characters can be handled when importing data using R or Python's pandas library, but it's still a
nuisance for other common Unix utilities like GNU's head
, tail
, less
, wc -l
, etc.
There are a few ways to convert these line endings using GNU utilities. You can use tr
or dos2unix
.
Using tr
Mac (\r
) to Unix (\n
)
To replace those ^M
's with Unix line endings, use the tr
utility:
tr '\r' '\n' < DC_Metrorail_Ridership.mac.csv > DC_Metrorail_Ridership.unix.csv
This will replace all carriage returns with line feeds.
Windows (\r\n
) to Unix (\n
)
Windows line endings use carriage return followed by line feed, so to convert Microsoft text files line endings to Unix use the following:
tr -d '\r' < DC_Metrorail_Ridership.windows.csv > DC_Metrorail_Ridership.unix.csv
Using dos2unix or mac2unix
Another option is to use the dos2unix
utility. To install on Mac OS X using Homebrew:
brew update
brew install dos2unix
Mac (\r
) to Unix (\n
)
mac2unix -n DC_Metrorail_Ridership.mac.csv DC_Metrorail_Ridership.unix.csv
Windows (\r\n
) to Unix (\n
)
dos2unix -n DC_Metrorail_Ridership.windows.csv DC_Metrorail_Ridership.unix.csv
I made a GitHub gist where you can test all of this out. Clone the gist with:
git clone https://gist.github.com/7509ae7cfeaf1072822a.git