lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 20 Sep 2021 13:52:21 -0700
From:   Dan Williams <>
To:     Luis Chamberlain <>
Cc:     Tejun Heo <>, Greg KH <>,
        Andrew Morton <>,
        Minchan Kim <>,,
        shuah <>, Randy Dunlap <>,
        "Rafael J. Wysocki" <>,
        Masahiro Yamada <>,
        Nick Desaulniers <>,,
        Nathan Chancellor <>,,
        Tetsuo Handa <>,,,
        Jarkko Sakkinen <>,
        Alexander Potapenko <>,,
        Stephen Hemminger <>,
        David Laight <>,,,
        Andy Shevchenko <>,,,
        Jiri Kosina <>,,
        Nitin Gupta <>,
        Sergey Senozhatsky <>,
        Reinette Chatre <>,
        Fenghua Yu <>,
        Borislav Petkov <>, X86 ML <>,
        "H. Peter Anvin" <>,,
        Johannes Weiner <>,
        Daniel Vetter <>,
        Bjorn Helgaas <>,
        Krzysztof WilczyƄski <>,, Christoph Hellwig <>,
        Joe Perches <>,,
        Jens Axboe <>,
        Josh Poimboeuf <>,
        Thomas Gleixner <>,
        Kees Cook <>,
        Steven Rostedt <>,
        Peter Zijlstra <>,,
        Linux Doc Mailing List <>,,
        linux-fsdevel <>,,,
        Linux Kernel Mailing List <>,
Subject: Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal

On Fri, Sep 17, 2021 at 10:05 PM Luis Chamberlain <> wrote:
> When sysfs attributes use a lock also used on module removal we can
> race to deadlock. This happens when for instance a sysfs file on
> a driver is used, then at the same time we have module removal call
> trigger. The module removal call code holds a lock, and then the sysfs
> file entry waits for the same lock. While holding the lock the module
> removal tries to remove the sysfs entries, but these cannot be removed
> yet as one is waiting for a lock. This won't complete as the lock is
> already held. Likewise module removal cannot complete, and so we deadlock.
> This can now be easily reproducible with our sysfs selftest as follows:
> ./tools/testing/selftests/sysfs/ -t 0027
> To fix this we extend the struct kernfs_node with a module reference and
> use the try_module_get() after kernfs_get_active() is called which
> protects integrity and the existence of the kernfs node during the
> operation.
> So long as the kernfs node is protected with kernfs_get_active() we know
> we can rely on its contents. And, as now just documented in the previous
> patch, we also now know that once kernfs_get_active() is called the module
> is also guarded to exist and cannot be removed.
> If try_module_get() fails we fail the operation on the kernfs node.
> We use a try method as a full lock means we'd then make our sysfs
> attributes busy us out from possible module removal, and so userspace
> could force denying module removal, a silly form of "DOS" against module
> removal. A try lock on the module removal ensures we give priority to
> module removal and interacting with sysfs attributes only comes second.
> Using a full lock could mean for instance that if you don't stop poking
> at sysfs files you cannot remove a module.
> Races between removal of sysfs files and the module are not possible
> given sysfs files are created by the same module, and when a sysfs file
> is being used kernfs prevents removal of the sysfs file. So if module
> removal is actually happening the removal would have to wait until
> the sysfs file operation is complete.
> This deadlock was first reported with the zram driver, however the live
> patching folks have acknowledged they have observed this as well with
> live patching, when a live patch is removed. I was then able to
> reproduce easily by creating a dedicated selftests.
> A sketch of how this can happen follows:
> CPU A                              CPU B
>                                    whatever_store()
> module_unload
>   mutex_lock(foo)
>                                    mutex_lock(foo)
>    del_gendisk(zram->disk);
>      device_del()
>        device_remove_groups()

This flow seems possible to trigger with:

   echo $dev > /sys/bus/$bus/drivers/$driver/unbind

I am missing why module pinning is part of the solution when it's the
device_del() path that is racing? Module removal is just a more coarse
grained way to trigger unbind => device_del(). Isn't the above a bug
in the driver, not missing synchronization in kernfs? Forgive me if
the unbind question was asked and answered elsewhere, this is my first
time taking a look at this series.

Powered by blists - more mailing lists