lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNMfCK99DoUuR2qRBTLLhrGsYVcpKdtXW7S559tNJ-MO7A@mail.gmail.com>
Date:   Mon, 21 Oct 2019 14:15:15 +0200
From:   Marco Elver <elver@...gle.com>
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     Christian Brauner <christian.brauner@...ntu.com>,
        syzbot <syzbot+492a4acccd8fc75ddfd0@...kaller.appspotmail.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Arnd Bergmann <arnd@...db.de>, christian@...uner.io,
        deepa.kernel@...il.com, ebiederm@...ssion.com, guro@...com,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs@...glegroups.com, Will Deacon <will@...nel.org>
Subject: Re: KCSAN: data-race in exit_signals / prepare_signal

On Mon, 21 Oct 2019 at 14:00, Oleg Nesterov <oleg@...hat.com> wrote:
>
> On 10/21, Christian Brauner wrote:
> >
> > This traces back to Oleg fixing a race between a group stop and a thread
> > exiting before it notices that it has a pending signal or is in the middle of
> > do_exit() already, causing group stop to get wacky.
> > The original commit to fix this race is
> > commit d12619b5ff56 ("fix group stop with exit race") which took sighand
> > lock before setting PF_EXITING on the thread.
>
> Not really... sig_task_ignored() didn't check task->flags until the recent
> 33da8e7c81 ("signal: Allow cifs and drbd to receive their terminating signals").
> But I think this doesn't matter, see below.
>
> > If the race really matters and given how tsk->flags is currently accessed
> > everywhere the simple fix for now might be:
> >
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index c4da1ef56fdf..cf61e044c4cc 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -2819,7 +2819,9 @@ void exit_signals(struct task_struct *tsk)
> >         cgroup_threadgroup_change_begin(tsk);
> >
> >         if (thread_group_empty(tsk) || signal_group_exit(tsk->signal)) {
> > +               spin_lock_irq(&tsk->sighand->siglock);
> >                 tsk->flags |= PF_EXITING;
> > +               spin_unlock_irq(&tsk->sighand->siglock);
>
> Well, exit_signals() tries to avoid ->siglock in this case....
>
> But this doesn't matter. IIUC the problem is not that exit_signals() sets
> PF_EXITING, the problem is that it writes to tsk->flags and kasan detects
> the data race.
>
> For example, freezable_schedule() which sets PF_FREEZER_SKIP can equally
> "race" with sig_task_ignored() or with ANY other code which checks this
> task's flags.
>
> I think this is WONTFIX.

If taking the spinlock is unnecessary (which AFAIK it probably is) and
there are no other writers to this flag, you will still need a
WRITE_ONCE(tsk->flags, tsk->flags | PF_EXITING) to avoid the
data-race.

However, if it is possible that there are concurrent writers setting
other bits in flags, you need to ensure that the bits are set
atomically (unless it's ok to lose some bits here). This can only be
done via 1) taking siglock, or 2) via e.g. atomic_or(...), however,
flags is not atomic_t ...

-- Marco

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ