Is there a 'retry' logic in job manager?

Question

This is a follow-up question to my earlier post. I'm trying to filter a set of results obtained from job manager, and the following code works when my circuit list doesn't contain too many elements:
MExperiments = job_manager.run(all_circuits, backend=backend, shots = nshots)
results = MExperiments.results() 
cresults = results.combine_results()
mitigated_results = meas_filter.apply(cresults)

The combine_results() method is used to help us combine a set of results so that the final single result matches the requirements of apply. The prerequisite is that all jobs need to succeed. However, when my list all_circuits gets large, it shows me this error message:
IBMQManagedResultDataNotAvailable: 'Results cannot be combined since some of the jobs failed.'

I checked my job status in IBMQ and it shows all the circuits have been successfully executed. How can I fix this issue? Is there a way I can make the job monitor retry the jobs that failed? Thanks!!

jyu00 · Accepted Answer

When you do results = MExperiments.results(), it should issue a warning message telling you which job result could not be retrieved. The message is only issued for the first .result() though, since subsequent calls just use cached data. You can do results = MExperiments.results(refresh=True) to force it to re-fetch results from the server (and hence re-issue warning messages). It is possible that the job completed but fetching of the result failed (e.g. due to networking error). .result(refresh=True) also serves as a retry mechanism.
Another possibility is that the job submit failed. In which case the job won't show up in IBMQ dashboard. You can use print(MExperiments.report()) to see which set of circuits doesn't have a successful job.
To answer your last question, job manager today doesn't retry failed jobs (there is an issue open for that though). You can, however, do it manually by

use print(MExperiments.report()) to find out which job failed
use qobj = job.qobj() to get the qobj and then re-submit it with new_job = backend.run(qobj)
combine the results yourself and replace the failed job result with the new job result. The report from step 1 tells you the indexes of the experiments that belong to the failed job. Then you can do result.results[failed_start_index:failed_end_index+1] = new_job.result().results

Is there a 'retry' logic in job manager?

One Answer

Add your own answers!

Ask a Question