Skip to content Skip to sidebar Skip to footer

Python Script Taking Long Time To Run

I am writing a script a in python to parse ldap logs and then get the number of searches/binds by each user. I was testing my code on sample files and for smaller files till size o

Solution 1:

You are doing several scans on entire file on the line

count = re.findall('SEARCH REQ.*'+conid,fh1)

Avoid this. This is your major problem. Get all conids in a list and iterate on file again and list while your inner loop should consist of conids. Bring it out of outer loop. You will be doing two scans of file.

Also since it is plain Python run with PyPy for faster runs.

You can do this better with a FSM and by spending a bit more RAM. This is a hint and you will have to do your FSM yourself.

Edit 1: This is the version of script I wrote after seeing the log file. Please correct if there is any mistake:

#!/usr/bin/env pythonimport sys
import re


defparse(filepath):
        d = {}
        regex1 = re.compile(r'(.*)?BIND\sREQ(.*)uid=(\w+)')
        regex2 = re.compile(r'(.*)?SEARCH\sREQ(.*)uid=(\w+)')
        withopen(filepath, 'r') as f:
                for l in f:
                        m = re.search(regex1, l)
                        if m:
                                # print (m.group(3))
                                uid = m.group(3)
                                if uid in d:
                                        d[uid]['bind_count'] += 1else:
                                        d[uid] = {}
                                        d[uid]['bind_count'] = 1
                                        d[uid]['search_count'] = 0
                        m = re.search(regex2, l)
                        if m:
                                # print (m.group(3))
                                uid = m.group(3)
                                if uid in d:
                                        d[uid]['search_count'] += 1else:
                                        d[uid] = {}
                                        d[uid]['search_count'] = 1
                                        d[uid]['bind_count'] = 0for k in d:
                print('user id = ' + k, 'Bind count = ' + str(d[k]['bind_count']), 'Search count = ' + str(d[k]['search_count']))


defprocess_args():
        if sys.argv < 2:
                print('Usage: parse_ldap_log.py log_filepath')
                exit(1)



if __name__ == '__main__':
        process_args()
    parse(sys.argv[1])

Thank the Gods that it was not complicated enough to warrant an FSM.

Solution 2:

Use itertools library instead of so many loops.

Solution 3:

Your script has a quadratic complexity: for each line in the file you are making a read again to match the log entry. My suggestion is to read the file only one time and counting the occurrences of the needed entry (the one matching (" BIND REQ ")).

Solution 4:

I was able to solve my problem with below code.

import os,re,datetime
from collections import defaultdict



start_time=datetime.datetime.now()

bind_count=defaultdict(int)
search_conn=defaultdict(int)
bind_conn=defaultdict(str)
j=defaultdict(int)



fh = open("C:\\access","r")
total_searches=0
total_binds=0for line in fh:
    reg1=re.search(r' BIND REQ .*conn=(\d+).*dn=(.*")', line)
    reg2=re.search(r' SEARCH REQ .*conn=(\d+).*', line)
    if reg1:
        total_binds+=1
        uid,con=reg1.group(2,1)
        bind_count[uid]=bind_count[uid]+1
        bind_conn[con]=uid

    if reg2:
        total_searches+=1
        skey=reg2.group(1)
        search_conn[skey] = search_conn[skey]+1for conid in search_conn:
    if conid in bind_conn:
        new_key=bind_conn[conid]
        j[new_key]=j[new_key]+search_conn[conid]




for k,v in bind_count.items():
    print(k," = ",v)

print("*"*80)

for k,v in j.items():
    print(k,"-->",v)

fh.close()

del search_conn
del bind_conn

end_time=datetime.datetime.now()
print("Total time taken - {}".format(end_time-start_time))

Post a Comment for "Python Script Taking Long Time To Run"