linux-kernel - Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4jFaNTfpKq6hQsFrWXTwHz-wG+5K4m-hmsPeteLX4p5AQ@mail.gmail.com>
Date:   Mon, 20 Sep 2021 14:55:10 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Luis Chamberlain <mcgrof@...nel.org>
Cc:     Tejun Heo <tj@...nel.org>, Greg KH <gregkh@...uxfoundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Minchan Kim <minchan@...nel.org>, jeyu@...nel.org,
        shuah <shuah@...nel.org>, Randy Dunlap <rdunlap@...radead.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>, yzaikin@...gle.com,
        Nathan Chancellor <nathan@...nel.org>, ojeda@...nel.org,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        vitor@...saru.org, elver@...gle.com,
        Jarkko Sakkinen <jarkko@...nel.org>,
        Alexander Potapenko <glider@...gle.com>,
        rf@...nsource.cirrus.com,
        Stephen Hemminger <stephen@...workplumber.org>,
        David Laight <David.Laight@...lab.com>, bvanassche@....org,
        jolsa@...nel.org,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        trishalfonso@...gle.com, andreyknvl@...il.com,
        Jiri Kosina <jikos@...nel.org>, mbenes@...e.com,
        Nitin Gupta <ngupta@...are.org>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Reinette Chatre <reinette.chatre@...el.com>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Borislav Petkov <bp@...en8.de>, X86 ML <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>, lizefan.x@...edance.com,
        Johannes Weiner <hannes@...xchg.org>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Krzysztof Wilczyński <kw@...ux.com>,
        senozhatsky@...omium.org, Christoph Hellwig <hch@....de>,
        Joe Perches <joe@...ches.com>, hkallweit1@...il.com,
        Jens Axboe <axboe@...nel.dk>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Kees Cook <keescook@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-spdx@...r.kernel.org,
        Linux Doc Mailing List <linux-doc@...r.kernel.org>,
        linux-block@...r.kernel.org,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        linux-kselftest@...r.kernel.org, cgroups@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        copyleft-next@...ts.fedorahosted.org
Subject: Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal

On Mon, Sep 20, 2021 at 2:17 PM Luis Chamberlain <mcgrof@...nel.org> wrote:
>
> On Mon, Sep 20, 2021 at 01:52:21PM -0700, Dan Williams wrote:
> > On Fri, Sep 17, 2021 at 10:05 PM Luis Chamberlain <mcgrof@...nel.org> wrote:
> > > This deadlock was first reported with the zram driver, however the live
> > > patching folks have acknowledged they have observed this as well with
> > > live patching, when a live patch is removed. I was then able to
> > > reproduce easily by creating a dedicated selftests.
> > >
> > > A sketch of how this can happen follows:
> > >
> > > CPU A                              CPU B
> > >                                    whatever_store()
> > > module_unload
> > >   mutex_lock(foo)
> > >                                    mutex_lock(foo)
> > >    del_gendisk(zram->disk);
> > >      device_del()
> > >        device_remove_groups()
> >
> > This flow seems possible to trigger with:
> >
> >    echo $dev > /sys/bus/$bus/drivers/$driver/unbind
> >
> > I am missing why module pinning
>
> The aspect of try_module_get() which comes to value to prevent the
> deadlock is it ensures kernfs ops do not run once exit is on the way.
>
> > is part of the solution when it's the
> > device_del() path that is racing?
>
> But its not, the device_del() path will yield until the kernfs op
> completes. It is fine to wait there.
>
> The deadlock happens if a module exit routine uses a lock which is
> also used on a sysfs op. If the lock was first held by module exit,
> and module exit is waiting for the kernfs op to complete, and the
> kernfs op is waiting to hold the same lock then the exit will wait
> forever.
>
> > Module removal is just a more coarse
> > grained way to trigger unbind => device_del().
>
> Right, but the device_del() path is not sharing a lock with the sysfs op.

The deadlock in the example comes from holding a lock over
device_del() that is also taken in a kernfs op.  For example, the code
above looks like something that runs from driver.remove(), not
exclusively module_exit(). Yes, module_exit() may trigger
driver.remove() via driver_unregister(), but there's other ways to
trigger driver.remove() that do not involve module_exit().

> > Isn't the above a bug
> > in the driver, not missing synchronization in kernfs?
>
> We can certainly take the position as an alternative:
>
>   "thou shalt not use a lock on exit which is also used on a syfs op"
>
> However that seems counter intuitive, specially if we can resolve the
> issue easily with a try_module_get().

Again, I don't see how try_module_get() can affect the ABBA failure
case of holding a lock over device_del() that is also held inside
sysfs op. I agree that the problem is subtle. Does lockdep not
complain about this case? If it's going to be avoided in the core it
seems try_module_get() does not completely cover the hole that
unsuspecting driver writers might fall into.