linux-kernel - Re: KCSAN: data-race in exit_signals / prepare

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANpmjNMfCK99DoUuR2qRBTLLhrGsYVcpKdtXW7S559tNJ-MO7A@mail.gmail.com>
Date:   Mon, 21 Oct 2019 14:15:15 +0200
From:   Marco Elver <elver@...gle.com>
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     Christian Brauner <christian.brauner@...ntu.com>,
        syzbot <syzbot+492a4acccd8fc75ddfd0@...kaller.appspotmail.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Arnd Bergmann <arnd@...db.de>, christian@...uner.io,
        deepa.kernel@...il.com, ebiederm@...ssion.com, guro@...com,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs@...glegroups.com, Will Deacon <will@...nel.org>
Subject: Re: KCSAN: data-race in exit_signals / prepare_signal

On Mon, 21 Oct 2019 at 14:00, Oleg Nesterov <oleg@...hat.com> wrote:
>
> On 10/21, Christian Brauner wrote:
> >
> > This traces back to Oleg fixing a race between a group stop and a thread
> > exiting before it notices that it has a pending signal or is in the middle of
> > do_exit() already, causing group stop to get wacky.
> > The original commit to fix this race is
> > commit d12619b5ff56 ("fix group stop with exit race") which took sighand
> > lock before setting PF_EXITING on the thread.
>
> Not really... sig_task_ignored() didn't check task->flags until the recent
> 33da8e7c81 ("signal: Allow cifs and drbd to receive their terminating signals").
> But I think this doesn't matter, see below.
>
> > If the race really matters and given how tsk->flags is currently accessed
> > everywhere the simple fix for now might be:
> >
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index c4da1ef56fdf..cf61e044c4cc 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -2819,7 +2819,9 @@ void exit_signals(struct task_struct *tsk)
> >         cgroup_threadgroup_change_begin(tsk);
> >
> >         if (thread_group_empty(tsk) || signal_group_exit(tsk->signal)) {
> > +               spin_lock_irq(&tsk->sighand->siglock);
> >                 tsk->flags |= PF_EXITING;
> > +               spin_unlock_irq(&tsk->sighand->siglock);
>
> Well, exit_signals() tries to avoid ->siglock in this case....
>
> But this doesn't matter. IIUC the problem is not that exit_signals() sets
> PF_EXITING, the problem is that it writes to tsk->flags and kasan detects
> the data race.
>
> For example, freezable_schedule() which sets PF_FREEZER_SKIP can equally
> "race" with sig_task_ignored() or with ANY other code which checks this
> task's flags.
>
> I think this is WONTFIX.

If taking the spinlock is unnecessary (which AFAIK it probably is) and
there are no other writers to this flag, you will still need a
WRITE_ONCE(tsk->flags, tsk->flags | PF_EXITING) to avoid the
data-race.

However, if it is possible that there are concurrent writers setting
other bits in flags, you need to ensure that the bits are set
atomically (unless it's ok to lose some bits here). This can only be
done via 1) taking siglock, or 2) via e.g. atomic_or(...), however,
flags is not atomic_t ...

-- Marco