linux-kernel - Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4i0xEwMQ5kSK-xGroV7aZr3j1YNrGMVLiLMr3U8nFCMKA@mail.gmail.com>
Date:   Mon, 20 Sep 2021 13:52:21 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Luis Chamberlain <mcgrof@...nel.org>
Cc:     Tejun Heo <tj@...nel.org>, Greg KH <gregkh@...uxfoundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Minchan Kim <minchan@...nel.org>, jeyu@...nel.org,
        shuah <shuah@...nel.org>, Randy Dunlap <rdunlap@...radead.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>, yzaikin@...gle.com,
        Nathan Chancellor <nathan@...nel.org>, ojeda@...nel.org,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        vitor@...saru.org, elver@...gle.com,
        Jarkko Sakkinen <jarkko@...nel.org>,
        Alexander Potapenko <glider@...gle.com>,
        rf@...nsource.cirrus.com,
        Stephen Hemminger <stephen@...workplumber.org>,
        David Laight <David.Laight@...lab.com>, bvanassche@....org,
        jolsa@...nel.org,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        trishalfonso@...gle.com, andreyknvl@...il.com,
        Jiri Kosina <jikos@...nel.org>, mbenes@...e.com,
        Nitin Gupta <ngupta@...are.org>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Reinette Chatre <reinette.chatre@...el.com>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Borislav Petkov <bp@...en8.de>, X86 ML <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>, lizefan.x@...edance.com,
        Johannes Weiner <hannes@...xchg.org>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Krzysztof Wilczyński <kw@...ux.com>,
        senozhatsky@...omium.org, Christoph Hellwig <hch@....de>,
        Joe Perches <joe@...ches.com>, hkallweit1@...il.com,
        Jens Axboe <axboe@...nel.dk>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Kees Cook <keescook@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-spdx@...r.kernel.org,
        Linux Doc Mailing List <linux-doc@...r.kernel.org>,
        linux-block@...r.kernel.org,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        linux-kselftest@...r.kernel.org, cgroups@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        copyleft-next@...ts.fedorahosted.org
Subject: Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal

On Fri, Sep 17, 2021 at 10:05 PM Luis Chamberlain <mcgrof@...nel.org> wrote:
>
> When sysfs attributes use a lock also used on module removal we can
> race to deadlock. This happens when for instance a sysfs file on
> a driver is used, then at the same time we have module removal call
> trigger. The module removal call code holds a lock, and then the sysfs
> file entry waits for the same lock. While holding the lock the module
> removal tries to remove the sysfs entries, but these cannot be removed
> yet as one is waiting for a lock. This won't complete as the lock is
> already held. Likewise module removal cannot complete, and so we deadlock.
>
> This can now be easily reproducible with our sysfs selftest as follows:
>
> ./tools/testing/selftests/sysfs/sysfs.sh -t 0027
>
> To fix this we extend the struct kernfs_node with a module reference and
> use the try_module_get() after kernfs_get_active() is called which
> protects integrity and the existence of the kernfs node during the
> operation.
>
> So long as the kernfs node is protected with kernfs_get_active() we know
> we can rely on its contents. And, as now just documented in the previous
> patch, we also now know that once kernfs_get_active() is called the module
> is also guarded to exist and cannot be removed.
>
> If try_module_get() fails we fail the operation on the kernfs node.
>
> We use a try method as a full lock means we'd then make our sysfs
> attributes busy us out from possible module removal, and so userspace
> could force denying module removal, a silly form of "DOS" against module
> removal. A try lock on the module removal ensures we give priority to
> module removal and interacting with sysfs attributes only comes second.
> Using a full lock could mean for instance that if you don't stop poking
> at sysfs files you cannot remove a module.
>
> Races between removal of sysfs files and the module are not possible
> given sysfs files are created by the same module, and when a sysfs file
> is being used kernfs prevents removal of the sysfs file. So if module
> removal is actually happening the removal would have to wait until
> the sysfs file operation is complete.
>
> This deadlock was first reported with the zram driver, however the live
> patching folks have acknowledged they have observed this as well with
> live patching, when a live patch is removed. I was then able to
> reproduce easily by creating a dedicated selftests.
>
> A sketch of how this can happen follows:
>
> CPU A                              CPU B
>                                    whatever_store()
> module_unload
>   mutex_lock(foo)
>                                    mutex_lock(foo)
>    del_gendisk(zram->disk);
>      device_del()
>        device_remove_groups()

This flow seems possible to trigger with:

   echo $dev > /sys/bus/$bus/drivers/$driver/unbind

I am missing why module pinning is part of the solution when it's the
device_del() path that is racing? Module removal is just a more coarse
grained way to trigger unbind => device_del(). Isn't the above a bug
in the driver, not missing synchronization in kernfs? Forgive me if
the unbind question was asked and answered elsewhere, this is my first
time taking a look at this series.