lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191021125101.x7omk7xa2kyc7hue@wittgenstein>
Date:   Mon, 21 Oct 2019 14:51:02 +0200
From:   Christian Brauner <christian.brauner@...ntu.com>
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     syzbot <syzbot+492a4acccd8fc75ddfd0@...kaller.appspotmail.com>,
        akpm@...ux-foundation.org, arnd@...db.de, christian@...uner.io,
        deepa.kernel@...il.com, ebiederm@...ssion.com, elver@...gle.com,
        guro@...com, linux-kernel@...r.kernel.org,
        syzkaller-bugs@...glegroups.com, will@...nel.org
Subject: Re: KCSAN: data-race in exit_signals / prepare_signal

On Mon, Oct 21, 2019 at 02:00:30PM +0200, Oleg Nesterov wrote:
> On 10/21, Christian Brauner wrote:
> >
> > This traces back to Oleg fixing a race between a group stop and a thread
> > exiting before it notices that it has a pending signal or is in the middle of
> > do_exit() already, causing group stop to get wacky.
> > The original commit to fix this race is
> > commit d12619b5ff56 ("fix group stop with exit race") which took sighand
> > lock before setting PF_EXITING on the thread.
> 
> Not really... sig_task_ignored() didn't check task->flags until the recent
> 33da8e7c81 ("signal: Allow cifs and drbd to receive their terminating signals").
> But I think this doesn't matter, see below.
> 
> > If the race really matters and given how tsk->flags is currently accessed
> > everywhere the simple fix for now might be:
> >
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index c4da1ef56fdf..cf61e044c4cc 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -2819,7 +2819,9 @@ void exit_signals(struct task_struct *tsk)
> >         cgroup_threadgroup_change_begin(tsk);
> >
> >         if (thread_group_empty(tsk) || signal_group_exit(tsk->signal)) {
> > +               spin_lock_irq(&tsk->sighand->siglock);
> >                 tsk->flags |= PF_EXITING;
> > +               spin_unlock_irq(&tsk->sighand->siglock);
> 
> Well, exit_signals() tries to avoid ->siglock in this case....
> 
> But this doesn't matter. IIUC the problem is not that exit_signals() sets
> PF_EXITING, the problem is that it writes to tsk->flags and kasan detects
> the data race.

Right, that's the reason I said "If the race really matters". I thought
that other writers/readers always take sighand lock. So the easy fix
would have been to take sighand lock too.
The alternative is that we need to fiddle with task_struct itself and
replace flags with an atomic_t or sm which is more invasive and probably
more controversial.

Christian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ