Trying To Sort Through A .csv File In Python (create List And Remove Duplicates)
I have a .csv files with 9 columns. I need to get a list of the fifth column with no duplicates without using panda. The values in the column are product ID's, so things like 'H007
Solution 1:
You should take a look into the perfect object for you : set()
The set removes duplicates and let's you check if a value is in the set in O(1).
So your code should look like :
import csv
without_duplicates = set()
withopen('myfile.csv', 'r') as f_the_file:
reader = csv.reader(f_the_file)
for row in reader:
without_duplicates.add(row[4])
Solution 2:
Initialize an empty set and then add elements to the set. This way you only keep on adding unique elements to the set. After you finish reading the the file you can convert it to list if you need so.
import csv
productID = set()
withopen('myfile.csv', 'r') as f_the_file:
reader = csv.reader(f_the_file)
for row in reader:
productID.add(row[4])
productID_list = list(productID)
Solution 3:
You can just use a set
comprehension for this:
import csv
withopen('myfile.csv') as f:
product_ids = {row[4] for row in csv.reader(f)}
If you absolutely need a list, just call product_ids = list(product_ids)
afterwards.
If you need to conserve the original order, (leaving a value where it first appeared), you should use the itertools
recipeunique_everseen
(might require a lot of memory):
from itertools import filterfalse
defunique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."# unique_everseen('AAAABBBCCDAABBB') --> A B C D# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key isNone:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k notin seen:
seen_add(k)
yield element
import csv
withopen('myfile.csv') as f:
product_ids = list(unique_everseen(row[4] for row in csv.reader(f)))
Post a Comment for "Trying To Sort Through A .csv File In Python (create List And Remove Duplicates)"