Skip to content Skip to sidebar Skip to footer

Trying To Sort Through A .csv File In Python (create List And Remove Duplicates)

I have a .csv files with 9 columns. I need to get a list of the fifth column with no duplicates without using panda. The values in the column are product ID's, so things like 'H007

Solution 1:

You should take a look into the perfect object for you : set()

The set removes duplicates and let's you check if a value is in the set in O(1).

So your code should look like :

import csv
without_duplicates = set()
withopen('myfile.csv', 'r') as f_the_file:

    reader = csv.reader(f_the_file)

    for row in reader:

        without_duplicates.add(row[4])

Solution 2:

Initialize an empty set and then add elements to the set. This way you only keep on adding unique elements to the set. After you finish reading the the file you can convert it to list if you need so.

import csv

productID = set()
withopen('myfile.csv', 'r') as f_the_file:
    reader = csv.reader(f_the_file)
    for row in reader:
        productID.add(row[4])

productID_list = list(productID)

Solution 3:

You can just use a set comprehension for this:

import csv

withopen('myfile.csv') as f:
    product_ids = {row[4] for row in csv.reader(f)}

If you absolutely need a list, just call product_ids = list(product_ids) afterwards.


If you need to conserve the original order, (leaving a value where it first appeared), you should use the itertools recipeunique_everseen (might require a lot of memory):

from itertools import filterfalse

defunique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."# unique_everseen('AAAABBBCCDAABBB') --> A B C D# unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key isNone:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k notin seen:
                seen_add(k)
                yield element
import csv

withopen('myfile.csv') as f:
    product_ids = list(unique_everseen(row[4] for row in csv.reader(f)))

Post a Comment for "Trying To Sort Through A .csv File In Python (create List And Remove Duplicates)"