Python Script Taking Long Time To Run
Solution 1:
You are doing several scans on entire file on the line
count = re.findall('SEARCH REQ.*'+conid,fh1)
Avoid this. This is your major problem. Get all conids in a list and iterate on file again and list while your inner loop should consist of conids. Bring it out of outer loop. You will be doing two scans of file.
Also since it is plain Python run with PyPy for faster runs.
You can do this better with a FSM and by spending a bit more RAM. This is a hint and you will have to do your FSM yourself.
Edit 1: This is the version of script I wrote after seeing the log file. Please correct if there is any mistake:
#!/usr/bin/env pythonimport sys
import re
defparse(filepath):
d = {}
regex1 = re.compile(r'(.*)?BIND\sREQ(.*)uid=(\w+)')
regex2 = re.compile(r'(.*)?SEARCH\sREQ(.*)uid=(\w+)')
withopen(filepath, 'r') as f:
for l in f:
m = re.search(regex1, l)
if m:
# print (m.group(3))
uid = m.group(3)
if uid in d:
d[uid]['bind_count'] += 1else:
d[uid] = {}
d[uid]['bind_count'] = 1
d[uid]['search_count'] = 0
m = re.search(regex2, l)
if m:
# print (m.group(3))
uid = m.group(3)
if uid in d:
d[uid]['search_count'] += 1else:
d[uid] = {}
d[uid]['search_count'] = 1
d[uid]['bind_count'] = 0for k in d:
print('user id = ' + k, 'Bind count = ' + str(d[k]['bind_count']), 'Search count = ' + str(d[k]['search_count']))
defprocess_args():
if sys.argv < 2:
print('Usage: parse_ldap_log.py log_filepath')
exit(1)
if __name__ == '__main__':
process_args()
parse(sys.argv[1])
Thank the Gods that it was not complicated enough to warrant an FSM.
Solution 2:
Use itertools library instead of so many loops.
Solution 3:
Your script has a quadratic complexity: for each line in the file you are making a read again to match the log entry. My suggestion is to read the file only one time and counting the occurrences of the needed entry (the one matching (" BIND REQ ")).
Solution 4:
I was able to solve my problem with below code.
import os,re,datetime
from collections import defaultdict
start_time=datetime.datetime.now()
bind_count=defaultdict(int)
search_conn=defaultdict(int)
bind_conn=defaultdict(str)
j=defaultdict(int)
fh = open("C:\\access","r")
total_searches=0
total_binds=0for line in fh:
reg1=re.search(r' BIND REQ .*conn=(\d+).*dn=(.*")', line)
reg2=re.search(r' SEARCH REQ .*conn=(\d+).*', line)
if reg1:
total_binds+=1
uid,con=reg1.group(2,1)
bind_count[uid]=bind_count[uid]+1
bind_conn[con]=uid
if reg2:
total_searches+=1
skey=reg2.group(1)
search_conn[skey] = search_conn[skey]+1for conid in search_conn:
if conid in bind_conn:
new_key=bind_conn[conid]
j[new_key]=j[new_key]+search_conn[conid]
for k,v in bind_count.items():
print(k," = ",v)
print("*"*80)
for k,v in j.items():
print(k,"-->",v)
fh.close()
del search_conn
del bind_conn
end_time=datetime.datetime.now()
print("Total time taken - {}".format(end_time-start_time))
Post a Comment for "Python Script Taking Long Time To Run"