lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 27 Oct 2021 13:57:40 +0200 (CEST)
From:   Miroslav Benes <mbenes@...e.cz>
To:     Luis Chamberlain <mcgrof@...nel.org>
cc:     Ming Lei <ming.lei@...hat.com>,
        Julia Lawall <julia.lawall@...ia.fr>,
        Petr Mladek <pmladek@...e.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Paul Mackerras <paulus@...ba.org>, tj@...nel.org,
        gregkh@...uxfoundation.org, akpm@...ux-foundation.org,
        minchan@...nel.org, jeyu@...nel.org, shuah@...nel.org,
        bvanassche@....org, dan.j.williams@...el.com, joe@...ches.com,
        tglx@...utronix.de, keescook@...omium.org, rostedt@...dmis.org,
        linux-spdx@...r.kernel.org, linux-doc@...r.kernel.org,
        linux-block@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org,
        live-patching@...r.kernel.org
Subject: Re: [PATCH v8 11/12] zram: fix crashes with cpu hotplug multistate

On Tue, 26 Oct 2021, Luis Chamberlain wrote:

> On Tue, Oct 26, 2021 at 11:37:30PM +0800, Ming Lei wrote:
> > On Tue, Oct 26, 2021 at 10:48:18AM +0200, Petr Mladek wrote:
> > > Livepatch code never called kobject_del() under a lock. It would cause
> > > the obvious deadlock.
> 
> Never?

kobject_put() to be precise.

When I started working on the support for module/live patches removal, 
calling kobject_put() under our klp_mutex lock was the obvious first 
choice given how the code was structured, but I ran into problems with 
deadlocks immediately. So it was changed to async approach with the 
workqueue. Thus the mainline code has never suffered from this, but we 
knew about the issues.
 
> > > The historic code only waited in the
> > > module_exit() callback until the sysfs interface was removed.
> > 
> > OK, then Luis shouldn't consider livepatching as one such issue to solve
> > with one generic solution.
> 
> It's not what I was told when the deadlock was found with zram, so I was
> informed quite the contrary.

>From my perspective, it is quite easy to get it wrong due to either a lack 
of generic support, or missing rules/documentation. So if this thread 
leads to "do not share locks between a module removal and a sysfs 
operation" strict rule, it would be at least something. In the same 
manner as Luis proposed to document try_module_get() expectations.
 
> I'm working on a generic coccinelle patch which hunts for actual cases
> using iteration (a feature of coccinelle for complex searches). The
> search is pretty involved, so I don't think I'll have an answer to this
> soon.
> 
> Since the question of how generic this deadlock is remains questionable,
> I think it makes sense to put the generic deadlock fix off the table for
> now, and we address this once we have a more concrete search with
> coccinelle.
> 
> But to say we *don't* have drivers which can cause this is obviously
> wrong as well, from a cursory search so far. But let's wait and see how
> big this list actually is.
> 
> I'll drop the deadlock generic fixes and move on with at least a starter
> kernfs / sysfs tests.

It makes sense to me.

Thanks, Luis, for pursuing it.

Miroslav

Powered by blists - more mailing lists