lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 11 May 2022 23:15:09 -1000 From: Tejun Heo <tj@...nel.org> To: Byungchul Park <byungchul.park@....com> Cc: torvalds@...ux-foundation.org, holt@....com, mcgrof@...nel.org, damien.lemoal@...nsource.wdc.com, linux-ide@...r.kernel.org, adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org, mingo@...hat.com, linux-kernel@...r.kernel.org, peterz@...radead.org, will@...nel.org, tglx@...utronix.de, rostedt@...dmis.org, joel@...lfernandes.org, sashal@...nel.org, daniel.vetter@...ll.ch, chris@...is-wilson.co.uk, duyuyang@...il.com, johannes.berg@...el.com, tytso@....edu, willy@...radead.org, david@...morbit.com, amir73il@...il.com, bfields@...ldses.org, gregkh@...uxfoundation.org, kernel-team@....com, linux-mm@...ck.org, akpm@...ux-foundation.org, mhocko@...nel.org, minchan@...nel.org, hannes@...xchg.org, vdavydov.dev@...il.com, sj@...nel.org, jglisse@...hat.com, dennis@...nel.org, cl@...ux.com, penberg@...nel.org, rientjes@...gle.com, vbabka@...e.cz, ngupta@...are.org, linux-block@...r.kernel.org, paolo.valente@...aro.org, josef@...icpanda.com, linux-fsdevel@...r.kernel.org, viro@...iv.linux.org.uk, jack@...e.cz, jack@...e.com, jlayton@...nel.org, dan.j.williams@...el.com, hch@...radead.org, djwong@...nel.org, dri-devel@...ts.freedesktop.org, airlied@...ux.ie, rodrigosiqueiramelo@...il.com, melissa.srw@...il.com, hamohammed.sa@...il.com, 42.hyeyoo@...il.com Subject: Re: [REPORT] syscall reboot + umh + firmware fallback Hello, Just took a look out of curiosity. On Thu, May 12, 2022 at 02:25:57PM +0900, Byungchul Park wrote: > PROCESS A PROCESS B WORKER C > > __do_sys_reboot() > __do_sys_reboot() > mutex_lock(&system_transition_mutex) > ... mutex_lock(&system_transition_mutex) <- stuck > ... > request_firmware_work_func() > _request_firmware() > firmware_fallback_sysfs() > usermodehelper_read_lock_wait() > down_read(&umhelper_sem) > ... > fw_load_sysfs_fallback() > fw_sysfs_wait_timeout() > wait_for_completion_killable_timeout(&fw_st->completion) <- stuck > kernel_halt() > __usermodehelper_disable() > down_write(&umhelper_sem) <- stuck > > -------------------------------------------------------- > All the 3 contexts are stuck at this point. > -------------------------------------------------------- > > PROCESS A PROCESS B WORKER C > > ... > up_write(&umhelper_sem) > ... > mutex_unlock(&system_transition_mutex) <- cannot wake up B > > ... > kernel_halt() > notifier_call_chain() > hw_shutdown_notify() > kill_pending_fw_fallback_reqs() > __fw_load_abort() > complete_all(&fw_st->completion) <- cannot wake up C > > ... > usermodeheler_read_unlock() > up_read(&umhelper_sem) <- cannot wake up A I'm not sure I'm reading it correctly but it looks like "process B" column is superflous given that it's waiting on the same lock to do the same thing that A is already doing (besides, you can't really halt the machine twice). What it's reporting seems to be ABBA deadlock between A waiting on umhelper_sem and C waiting on fw_st->completion. The report seems spurious: 1. wait_for_completion_killable_timeout() doesn't need someone to wake it up to make forward progress because it will unstick itself after timeout expires. 2. complete_all() from __fw_load_abort() isn't the only source of wakeup. The fw loader can be, and mainly should be, woken up by firmware loading actually completing instead of being aborted. I guess the reason why B shows up there is because the operation order is such that just between A and C, the complete_all() takes place before __usermodehlper_disable(), so the whole thing kinda doesn't make sense as you can't block a past operation by a future one. Inserting process B introduces the reverse ordering. Thanks. -- tejun
Powered by blists - more mailing lists