Skip to content Skip to sidebar Skip to footer

Python Multiprocessing Pool Hangs On Map Call

I have a function that parses a file and inserts the data into MySQL using SQLAlchemy. I've been running the function sequentially on the result of os.listdir() and everything work

Solution 1:

You need to put all code which uses multiprocessing, inside its own function. This stops it recursively launching new pools when multiprocessing re-imports your module in separate processes:

defparse_file(filename):
    ...

defmain():
    pool = mp.Pool(processes=8)
    pool.map(parse_file, ['my_dir/' + filename for filename in os.listdir("my_dir")])

if __name__ == '__main__:
    main()

See the documentation about making sure your module is importable, also the advice for running on Windows(tm)

Solution 2:

The problem was a combination of 2 things:

  1. my pool code being called multiple times (thanks @Peter Wood)
  2. my DB code opening too many sessions (and/or) sharing sessions

I made the following changes and everything works now: Original File

def parse_file(filename):
    f = open(filename, 'rb')
    data = f.read()
    f.close()

    soup = BeautifulSoup(data,features="lxml", from_encoding='utf-8')

    # parse file here

    db_record = MyDBRecord(parsed_data)

    session = get_session() # see below
    session.add(db_record)
    session.commit()

pool = mp.Pool(processes=8)

pool.map(parse_file, ['my_dir/' + filename for filename in os.listdir("my_dir")])

DB File

def get_session():
    engine = create_engine('mysql://root:root@localhost/my_db')

    Base.metadata.create_all(engine)
    Base.metadata.bind = engine

    db_session = sessionmaker(bind=engine)

    return db_session()

Post a Comment for "Python Multiprocessing Pool Hangs On Map Call"