lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 22 Jun 2022 22:25:11 -0700
From:   Saravana Kannan <saravanak@...gle.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Arjan van de Ven <arjan@...ux.intel.com>,
        Ming Lei <ming.lei@...onical.com>,
        Alex Riesen <raa.lkml@...il.com>,
        Alan Stern <stern@...land.harvard.edu>,
        Jens Axboe <axboe@...nel.dk>,
        USB list <linux-usb@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Rusty Russell <rusty@...tcorp.com.au>,
        Marek Szyprowski <m.szyprowski@...sung.com>
Subject: Re: [PATCH 5/5] async, kmod: warn on synchronous request_module()
 from async workers

On Fri, Jan 18, 2013 at 2:12 PM Tejun Heo <tj@...nel.org> wrote:
>
> >>From 4983f3b51e18d008956dd113e0ea2f252774cefc Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@...nel.org>
> Date: Fri, 18 Jan 2013 14:05:57 -0800
>
> Synchronous requet_module() from an async worker can lead to deadlock
> because module init path may invoke async_synchronize_full().  The
> async worker waits for request_module() to complete and the module
> loading waits for the async task to finish.  This bug happened in the
> block layer because of default elevator auto-loading.
>
> Block layer has been updated not to do default elevator auto-loading
> and it has been decided to disallow synchronous request_module() from
> async workers.
>
> Trigger WARN_ON_ONCE() on synchronous request_module() from async
> workers.
>
> For more details, please refer to the following thread.
>
>   http://thread.gmane.org/gmane.linux.kernel/1420814
>
> Signed-off-by: Tejun Heo <tj@...nel.org>
> Reported-by: Alex Riesen <raa.lkml@...il.com>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Arjan van de Ven <arjan@...ux.intel.com>
> ---
>  kernel/kmod.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 1c317e3..ecd42b4 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -38,6 +38,7 @@
>  #include <linux/suspend.h>
>  #include <linux/rwsem.h>
>  #include <linux/ptrace.h>
> +#include <linux/async.h>
>  #include <asm/uaccess.h>
>
>  #include <trace/events/module.h>
> @@ -130,6 +131,14 @@ int __request_module(bool wait, const char *fmt, ...)
>  #define MAX_KMOD_CONCURRENT 50 /* Completely arbitrary value - KAO */
>         static int kmod_loop_msg;
>
> +       /*
> +        * We don't allow synchronous module loading from async.  Module
> +        * init may invoke async_synchronize_full() which will end up
> +        * waiting for this task which already is waiting for the module
> +        * loading to complete, leading to a deadlock.
> +        */
> +       WARN_ON_ONCE(wait && current_is_async());
> +

If a builtin driver does async probing even before we get to being
able to load modules, this causes a spurious warning splat.

Here's a report by Marek [1]. I tried taking a stab at not warning at
least for drivers that do async probing before the initcalls are done,
but then I got confused [2] trying to understand when is the earliest
point in the bootup that request_module() can succeed. If someone can
clarify my confusion, I can try avoiding this warning for calls to
request_module() before we can load any modules. Any other ideas for
either making this warning way less trigger happy about false
positives?

[1] - https://lore.kernel.org/lkml/d5796286-ec24-511a-5910-5673f8ea8b10@samsung.com/
[2] - https://lore.kernel.org/lkml/CAGETcx-MHwex8tHLB1d71MAP01-3OPDZSNCUBb3iT+BtrugJmQ@mail.gmail.com/

Another question (pardon my ignorance) is whether we need to
async_synchronize_full() at the end of do_init_module() or if we can
limit it to a smaller domain? Looking at this history, I see that this
call was added by Linus in this commit d6de2c80e9d7 ("async: Fix
module loading async-work regression"). Are we doing the blanket
async_synchronize_full() only because we are not keeping proper track
of the async domains? And if so, then what if we have a sync domain
per module and any uses of async_schedule*() triggered by that module
is tied to the module's async domain? Then we'd only need to sync that
module's domain and we won't hit any deadlock issues.

Grepping for async_schedule*() calls, I see only about 30 instances.
At a glance, it looks like most cases are:
1. Have a device/driver from which we can find the related module and
tie the async_scheduler() to that domain.
2. Just direct async_schedule*() calls from module_init() -- we can
just directly tie it to the module's domain.
3. Other?

Is this idea worth pursuing? Or am I going in a completely wrong direction?

Btw, I did see Linus's suggestion in one of the emails in this thread
(?) about just doing a synchronize full on device open. That'd seem
like it would work too, but I'm afraid to touch any file open code
path because I expect that to be a hot path.

-Saravana

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ