Python Multiprocessing Pool Hangs On Map Call
I have a function that parses a file and inserts the data into MySQL using SQLAlchemy. I've been running the function sequentially on the result of os.listdir() and everything work
Solution 1:
You need to put all code which uses multiprocessing, inside its own function. This stops it recursively launching new pools when multiprocessing re-imports your module in separate processes:
defparse_file(filename):
...
defmain():
pool = mp.Pool(processes=8)
pool.map(parse_file, ['my_dir/' + filename for filename in os.listdir("my_dir")])
if __name__ == '__main__:
main()
See the documentation about making sure your module is importable, also the advice for running on Windows(tm)
Solution 2:
The problem was a combination of 2 things:
- my pool code being called multiple times (thanks @Peter Wood)
- my DB code opening too many sessions (and/or) sharing sessions
I made the following changes and everything works now: Original File
def parse_file(filename):
f = open(filename, 'rb')
data = f.read()
f.close()
soup = BeautifulSoup(data,features="lxml", from_encoding='utf-8')
# parse file here
db_record = MyDBRecord(parsed_data)
session = get_session() # see below
session.add(db_record)
session.commit()
pool = mp.Pool(processes=8)
pool.map(parse_file, ['my_dir/' + filename for filename in os.listdir("my_dir")])
DB File
def get_session():
engine = create_engine('mysql://root:root@localhost/my_db')
Base.metadata.create_all(engine)
Base.metadata.bind = engine
db_session = sessionmaker(bind=engine)
return db_session()
Post a Comment for "Python Multiprocessing Pool Hangs On Map Call"