In the Linux kernel, the following vulnerability has been resolved:
md: Dont ignore suspended array in md_check_recovery()
mddev_suspend() never stop sync_thread, hence it doesnt make sense to ignore suspended array in md_check_recovery(), which might cause sync_thread cant be unregistered.
After commit f52f5c71f3d4 (md: fix stopping sync thread), following hang can be triggered by test shell/integrity-caching.sh:
suspend the array: raid_postsuspend mddev_suspend
stop the array: raid_dtr md_stop __md_stop_writes stop_sync_thread set_bit(MD_RECOVERY_INTR, &mddev->recovery); md_wakeup_thread_directly(mddev->sync_thread); wait_event(…, !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
sync thread done: md_do_sync set_bit(MD_RECOVERY_DONE, &mddev->recovery); md_wakeup_thread(mddev->thread);
daemon thread cant unregister sync thread: md_check_recovery if (mddev->suspended) return; -> return directly md_read_sync_thread clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery); -> MD_RECOVERY_RUNNING cant be cleared, hence step 2 hang;
This problem is not just related to dm-raid, fix it by ignoring suspended array in md_check_recovery(). And follow up patches will improve dm-raid better to frozen sync thread during suspend.