Getting The Lesser N Elements Of A List In Python

October 27, 2023 Post a Comment

I need to get the lesser n numbers of a list in Python. I need this to be really fast because it's in a critical part for performance and it needs to be repeated a lot of times. n

Solution 1:

You actually want a sorted sequence of mins.

mins = items[:n]
mins.sort()
for i in items[n:]:
    if i < mins[-1]: 
        mins.append(i)
        mins.sort()
        mins= mins[:n]

This runs much faster because you aren't even looking at mins unless it's provably got a value larger than the given item. About 1/10th the time of the original algorithm.

This ran in zero time on my Dell. I had to run it 10 times to get a measurable run time.

mins(items, n): 0.297000169754sorted(items)[:n]: 0.109999895096mins2(items)[:n]: 0.0309998989105

Using bisect.insort instead of append and sort may speed this up a hair further.

Solution 2:

importheapqnlesser_items= heapq.nsmallest(n, items)

Here's a correct version of S.Lott's algorithm:

from bisect    import insort
from itertools import islice

defnsmallest_slott_bisect(n, iterable, insort=insort):
    it   = iter(iterable)
    mins = sorted(islice(it, n))
    for el in it:
        if el <= mins[-1]: #NOTE: equal sign is to preserve duplicates
            insort(mins, el)
            mins.pop()

    return mins

Performance:

$ python -mtimeit -s "import marshal; from nsmallest import nsmallest$label as nsmallest; items = marshal.load(open('items.marshal','rb')); n = 10"\
 "nsmallest(n, items)"

nsmallest_heapq
100 loops, best of 3: 12.9 msec per loop
nsmallest_slott_list
100 loops, best of 3: 4.37 msec per loop
nsmallest_slott_bisect
100 loops, best of 3: 3.95 msec per loop

nsmallest_slott_bisect is 3 times faster than heapq's nsmallest (for n=10, len(items)=20000). nsmallest_slott_list is only marginally slower. It is unclear why heapq's nsmallest is so slow; its algorithm is almost identical to the presented above (for small n).

Solution 3:

A possibility is to use the bisect module:

import bisect

defmins(items, n):
    mins = [float('inf')]*n
    for item in items:
        bisect.insort(mins, item)
        mins.pop()
    return mins

However, it's just a bit faster for me:

mins(items, n): 0.0892250537872
sorted(items)[:n]: 0.0990262031555

Using psyco does speed it up a bit more:

import bisect
import psyco
psyco.full()

defmins(items, n):
    mins = [float('inf')]*n
    for item in items:
        bisect.insort(mins, item)
        mins.pop()
    return mins

Result:

mins(items, n): 0.0431621074677
sorted(items)[:n]: 0.0859830379486

Solution 4:

If speed is of utmost concern, the fastest method is going to be with c. Psyco has an upfront cost, but may prove to be pretty fast. I would recommend Cython for python -> c compilation (a more up to date for pf Pyrex).

Hand coding it in c would be the best, and allow you to use data structures specific to your problem domain.

But note:

"Compiling the wrong algorithm in C may not be any faster than the right algorithm in Python" @S.Lott

I wanted to add S.Lott's comment so it gets noticed. Python make an excellent prototype language, where you can iron out an algorithm that you intend to later translate to a lower level language.

Solution 5:

why not just call the select_n_th element in O(N) time and then divide the array into two parts by the n_th element, this should be the fastest one.

ps: This O(N) algorithm works if you don't specify the order of the n-smallest elements The link below seems to do the selection algorithm. http://code.activestate.com/recipes/269554-select-the-nth-smallest-element/

Assuming the array doesn't have duplicate elements, the code works for me. The efficiency still depends on the problem scale, if n<10, probably an O(logn*N) algorithm is enough.

import random
import numpy as np
defselect(data, n):
    "Find the nth rank ordered element (the least value has rank 0)."
    data = list(data)
    ifnot0 <= n < len(data):
        raise ValueError('not enough elements for the given rank')
    whileTrue:
        pivot = random.choice(data)
        pcount = 0
        under, over = [], []
        uappend, oappend = under.append, over.append
        for elem in data:
            if elem < pivot:
                uappend(elem)
            elif elem > pivot:
                oappend(elem)
            else:
                pcount += 1if n < len(under):
            data = under
        elif n < len(under) + pcount:
            return pivot
        else:
            data = over
            n -= len(under) + pcount


defn_lesser(data,n):
    data_nth = select(data,n)
    ind = np.where(data<data_nth)
    return data[ind]

alezinhacris