lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Ysy92qs6Nc9zLqdp@zx2c4.com>
Date:   Tue, 12 Jul 2022 02:18:34 +0200
From:   "Jason A. Donenfeld" <Jason@...c4.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     linux-kernel@...r.kernel.org, gregerwin256@...il.com,
        toke@...hat.com, kvalo@...nel.org, rsalvaterra@...il.com,
        linux-wireless@...r.kernel.org
Subject: Re: [PATCH v5] signal: break out of wait loops on kthread_stop()

Hi Eric,

On Mon, Jul 11, 2022 at 07:00:25PM -0500, Eric W. Biederman wrote:
> "Jason A. Donenfeld" <Jason@...c4.com> writes:
> 
> > I was recently surprised to learn that msleep_interruptible(),
> > wait_for_completion_interruptible_timeout(), and related functions
> > simply hung when I called kthread_stop() on kthreads using them. The
> > solution to fixing the case with msleep_interruptible() was more simply
> > to move to schedule_timeout_interruptible(). Why?
> >
> > The reason is that msleep_interruptible(), and many functions just like
> > it, has a loop like this:
> >
> >         while (timeout && !signal_pending(current))
> >                 timeout = schedule_timeout_interruptible(timeout);
> >
> > The call to kthread_stop() woke up the thread, so schedule_timeout_
> > interruptible() returned early, but because signal_pending() returned
> > true, it went back into another timeout, which was never woken up.
> >
> > This wait loop pattern is common to various pieces of code, and I
> > suspect that the subtle misuse in a kthread that caused a deadlock in
> > the code I looked at last week is also found elsewhere.
> >
> > So this commit causes signal_pending() to return true when
> > kthread_stop() is called, by setting TIF_NOTIFY_SIGNAL.
> >
> > The same also probably applies to the similar kthread_park()
> > functionality, but that can be addressed later, as its semantics are
> > slightly different.
> 
> Acked-by: "Eric W. Biederman" <ebiederm@...ssion.com>
> 
> Do I need to pick this up and put it on a topic branch?
> Or does this patch have another patch to get merged?

I think it'd make sense to put this in a topic branch.

I discovered this in the process of developing [1] (which could use an
Ack for the wake_up_state EXPORT_SYMBOL, by the way). That's marked
for stable@ because the breakage there is kind of bad -- people can't put
their laptops to sleep right now. After both this patch and that one
land, I may then revisit the ath9k one and make changes for non-stable@.
That is, of course, if [1] even lands; right now I fear it might be
relegated to scheduler bikeshed purgatory and I don't have the time
right now to deal with that.

Longer term, this patch here will let me rework [1] to get rid of the
set_notify_signal() trick and use a proper condition variable wait with
`wait_for_condition_interruptible_timeout`, since this patch makes that
function work right for both contexts in which hwrng devices are called
(kthread and process). But that will mean reworking the hwrng API, which
means writing patches for every single hwrng driver, so that's work for
another time.

In the mean time, [1] is a good way forward (provided it gets acked).
And then this patch puts things in a good position to improve down the
line. I did a bunch of testing of this patch when trying out different
candidates for [1] before settling on [1] as a good-enough intermediate
solution. 

So, anyway, feel free to put this in a topic branch.

Jason

[1] https://lore.kernel.org/lkml/20220629114240.946411-1-Jason@zx2c4.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ