Skip to content Skip to sidebar Skip to footer

Case Insensitive For Sets In Python

I have a list that is generated from multiple lists. This combined list contains names that are generated by end users. Therefore contain similar names, but with different upper/lo

Solution 1:

You can track the .lower() version of the values using a set and then append the original values to a new list if their .lower() version isn't already in the set:

s = set()
L = []
for x in L0:
  if x.lower() notin s:
      s.add(x.lower())
      L.append(x)

print(L)
# ['A_B Cdef', 'GG_ooo', 'a1-23456']

Solution 2:

Use hash instead, I don't think you can accomplish that easily with sets.

L0 = {value.lower(): value for value in L0[::-1]}.values()

Solution 3:

You already have several good answers, and the code below is probably overkill for your use-case, but just for fun I created a simple case-insensitive mutable set class. Note that it keeps the first string that it finds rather than letting it get clobbered by later entries.

import collections.abc

classCasefoldSet(collections.abc.MutableSet):
    def__init__(self, iterable=None):
        self.elements = {} 
        if iterable isnotNone:
            for v in iterable:
                self.add(v)

    def__contains__(self, value):
        return value.casefold() in self.elements

    defadd(self, value):
        key = value.casefold()
        if key notin self.elements:
            self.elements[key] = value

    defdiscard(self, value):
        key = value.casefold()
        if key in self.elements:
            del self.elements[key]

    def__len__(self):
        returnlen(self.elements)

    def__iter__(self):
        returniter(self.elements.values())

    def__repr__(self):
        return'{' + ', '.join(map(repr, self)) + '}'# test

l0 = [
    'GG_ooo', 'A_B Cdef', 'A_B Cdef', 'A_B Cdef', 
    'A_B CdEF', 'A_B CDEF', 'a_B CdEF', 'A_b CDEF', 'a1-23456',
]

l1 = CasefoldSet(l0[:4])
print(l1)
l1 |= l0[4:]
print(l1)
l2 = {'a', 'b', 'A_B Cdef'} | l1
print(l2)
l3 = l2 & {'a', 'GG_ooo', 'a_B CdEF'}
print(l3)

output

{'GG_ooo', 'A_B Cdef'}
{'GG_ooo', 'A_B Cdef', 'a1-23456'}
{'GG_ooo', 'A_B Cdef', 'a1-23456', 'b', 'a'}
{'a_B CdEF', 'a', 'GG_ooo'}

This class inherits various useful methods from collections.abc.MutableSet, but to make it a full replacement for set it does need a few more methods. Note that it will raise AttributeError if you try to pass it non-string items .

Solution 4:

If you want to play by the rules, the best solution I can think of is a bit messy, using sets to track which words have appeared;

seen_words =set()
L1 = []
for word in L0:
    if word.lower() notin seen_words:
        L1.append(word)
        seen_words.add(word.lower())

If you want to get a little hackier there is a more elegant solution, you can use a dictionary to track which words have already been seen, and it's an almost one-liner;

seen_words = {}
L1 = [seen_words.setdefault(word.lower(), word) 
      for word in L0 if word.lower() notin seen_words]
print(L1)

Both solutions outputs the same result;

['A_B Cdef', 'GG_ooo', 'a1-23456']

Post a Comment for "Case Insensitive For Sets In Python"