Matteo Visconti di Oleggio Castello's website: Blog



Disabling multithreading for numpy, scikit-learn, etc. in conda

by Matteo Visconti di Oleggio Castello
Tue 25 July 2017

I was running RidgeCV from scikit-learn and realized that it was devouring all my cores. Moreover, I wrapped everything into a joblib parallel loop, so my poor server was hanging there, starving for more power.

It turns out that anaconda implements an optimized version of the Math Kernel Library (MKL), using multithreading for most vectorized operations. This is usually great, but if you use singularity in a shared HPC environment and you're submitting jobs to a queue, then you better curb your processes.

The solution comes from StackOverflow: you can either set an environment variable (MKL_NUM_THREADS), or add this to your code:

import mkl
mkl.set_num_threads(2)

For once, sysadmins won't send me angry emails about my running jobs.

If you have feedback on this post, please let me know! Send me a tweet or an email at mvdoc.grobfuscatemyemail!@dartmouth.edu.



Personal mod of Thème mnmlist, built using Pelican