In the Linux kernel, the following vulnerability has been resolved:
scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler
Inside scsi_eh_wakeup(), scsi_host_busy() is called & checked with host lock every time for deciding if error handler kthread needs to be waken up.
This can be too heavy in case of recovery, such as:
N hardware queues
queue depth is M for each hardware queue
each scsi_host_busy() iterates over (N * M) tag/requests
If recovery is triggered in case that all requests are in-flight, each scsi_eh_wakeup() is strictly serialized, when scsi_eh_wakeup() is called for the last in-flight request, scsi_host_busy() has been run for (N * M -
If both N and M are big enough, hard lockup can be triggered on acquiring host lock, and it is observed on mpi3mr(128 hw queues, queue depth 8169).
Fix the issue by calling scsi_host_busy() outside the host lock. We dont need the host lock for getting busy count because host the lock never covers that.
[mkp: Drop unnecessary busy variables pointed out by Bart]