Skip to content Skip to sidebar Skip to footer

Python - How To Sort Csv Data By Date In The Dd-mmm-yy Format?

I have the following type of .csv data in huge quantity: Red 11.1 12170612 137186880.5 01-Apr-14 pink 52.4 35139204 1902951951 01-Aug-11 yellow 18.45 1507480

Solution 1:

Assuming the date always starts in column 40 and runs to the end of the line, this will sort your data:

with open('foo.in') as input_file:
    lines = input_file.readlines()
lines.sort(key=lambda x:datetime.datetime.strptime(x[40:-1], '%d-%b-%y'))

If you have some other way of isolating the date (always 5th field, always last 9 chars of a line, whatever), adjust the lambda accordingly.

Solution 2:

You can use pandas:

import pandas as pd
data = pd.read_csv('date.csv', sep='\t', header=None)
data[4] = pd.to_datetime(data[4])
data.sort(4)

Out[2]:

    012341   pink    52.40351392041.902952e+092011-08-012   yellow  18.45150748082.814446e+082012-08-010   Red     11.10121706121.371869e+082014-04-01

where 'date.csv' is the data you posted in your question:

Red 11.1    12170612    137186880.5 01-Apr-14
pink    52.4    35139204    1902951951  01-Aug-11
yellow  18.45   15074808    281444556.8 01-Aug-12

Solution 3:

For your updated data:

import numpy as np
import pandas as pd

your_data = [['pink', 9644921,'01-APR-2010','0'], ['yellow', 2243817504.85, '01-APR-2011','0'], ['pink', 223468, '01-APR-2013', '23891'], ['orange', 137186880.5, '01-APR-2014', '1'], ['yellow', 51950.8, '01-AUG-2011', '87674']]
aa = np.array(your_data)
data = pd.DataFrame(aa)
data[2] = pd.to_datetime(data[2])
data = data.sort(2)
print data

It will give:

01230pink96449212010-04-01      01yellow2243817504.852011-04-01      04yellow51950.82011-08-01  876742pink2234682013-04-01  238913orange137186880.52014-04-01      1

Post a Comment for "Python - How To Sort Csv Data By Date In The Dd-mmm-yy Format?"