TransWikia.com

Function not returning in multiprocess, no errors

Stack Overflow Asked on November 12, 2021

I am running multiple processes with Pool

import spacy
import multiprocessing
import logging

# global variable
nlp_bert = spacy.load("en_trf_bertbaseuncased_lg")
logging.basicConfig(level=logging.DEBUG)


def job_pool(data, job_number, job_to_do, groupby=None, split_col=None, **kwargs):
    pool = multiprocessing.Pool(processes=job_number)
    jobs = pool.map(job_to_do, data)
    return jobs


def job(slice):
    logging.debug('this shows')
    w1 = nlp_bert('word')
    w2 = nlp_bert('other')
    logging.debug(w1.similarity(w2))
    logging.debug("this doesn't")


job_pool([1, 2, 3, 4], 4, job)

The nlp_bert function does not return anything and there is no error. How can I find out what is going wrong? I have logging set to debug level already.

The function works outside of multiprocess – i.e. just writing it in a script and running the following.

import spacy
nlp_bert = spacy.load("en_trf_bertbaseuncased_lg")
w1 = nlp_bert('word')
w2 = nlp_bert('other')
print(w1.similarity(w2))

0.8381155446247196

I’m using:

  • Python 3.8.2
  • spacy Version: 2.3.2

One Answer

It turns out this is a known issue with pytorch running multithreading in child processes, causing deadlocks.

https://github.com/explosion/spaCy/issues/4667

A workaround is to add the following:

import torch

torch.set_num_threads(1)

Answered by forgetso on November 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP