linux-kernel - Re: KCSAN: data-race in taskstats_exit / taskstats

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANpmjNPUc0nzo87zZJ_GE3+29m+SNt0c-+H7T5xUVskXxaun8Q@mail.gmail.com>
Date:   Wed, 6 Nov 2019 11:23:52 +0100
From:   Marco Elver <elver@...gle.com>
To:     Balbir Singh <bsingharora@...il.com>
Cc:     syzbot <syzbot+c5d03165a1bd1dead0c1@...kaller.appspotmail.com>,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs@...glegroups.com
Subject: Re: KCSAN: data-race in taskstats_exit / taskstats_exit

On Wed, 6 Nov 2019 at 01:10, Balbir Singh <bsingharora@...il.com> wrote:
>
> On Fri, 2019-10-04 at 21:26 -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    b4bd9343 x86, kcsan: Enable KCSAN for x86
> > git tree:       https://github.com/google/ktsan.git kcsan
> > console output: https://syzkaller.appspot.com/x/log.txt?x=125329db600000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=c0906aa620713d80
> > dashboard link: https://syzkaller.appspot.com/bug?extid=c5d03165a1bd1dead0c1
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+c5d03165a1bd1dead0c1@...kaller.appspotmail.com
> >
> > ==================================================================
> > BUG: KCSAN: data-race in taskstats_exit / taskstats_exit
> >
> > write to 0xffff8881157bbe10 of 8 bytes by task 7951 on cpu 0:
> >   taskstats_tgid_alloc kernel/taskstats.c:567 [inline]
> >   taskstats_exit+0x6b7/0x717 kernel/taskstats.c:596
> >   do_exit+0x2c2/0x18e0 kernel/exit.c:864
> >   do_group_exit+0xb4/0x1c0 kernel/exit.c:983
> >   get_signal+0x2a2/0x1320 kernel/signal.c:2734
> >   do_signal+0x3b/0xc00 arch/x86/kernel/signal.c:815
> >   exit_to_usermode_loop+0x250/0x2c0 arch/x86/entry/common.c:159
> >   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
> >   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
> >   do_syscall_64+0x2d7/0x2f0 arch/x86/entry/common.c:299
> >   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > read to 0xffff8881157bbe10 of 8 bytes by task 7949 on cpu 1:
> >   taskstats_tgid_alloc kernel/taskstats.c:559 [inline]
> >   taskstats_exit+0xb2/0x717 kernel/taskstats.c:596
> >   do_exit+0x2c2/0x18e0 kernel/exit.c:864
> >   do_group_exit+0xb4/0x1c0 kernel/exit.c:983
> >   __do_sys_exit_group kernel/exit.c:994 [inline]
> >   __se_sys_exit_group kernel/exit.c:992 [inline]
> >   __x64_sys_exit_group+0x2e/0x30 kernel/exit.c:992
> >   do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
> >   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
>
> Sorry I've been away and just catching up with email
>
> I don't think this is a bug, if I interpret the report correctly it shows a
> race
>
> static struct taskstats *taskstats_tgid_alloc(struct task_struct *tsk)
> {
>         struct signal_struct *sig = tsk->signal;
>         struct taskstats *stats;
>
> #1      if (sig->stats || thread_group_empty(tsk)) <- the check of sig->stats
>                 goto ret;
>
>         /* No problem if kmem_cache_zalloc() fails */
>         stats = kmem_cache_zalloc(taskstats_cache, GFP_KERNEL);
>
>         spin_lock_irq(&tsk->sighand->siglock);
>         if (!sig->stats) {
> #2              sig->stats = stats; <- here in setting sig->stats
>                 stats = NULL;
>         }
>         spin_unlock_irq(&tsk->sighand->siglock);
>
>         if (stats)
>                 kmem_cache_free(taskstats_cache, stats);
> ret:
>         return sig->stats;
> }
>
> The worst case scenario is that we might see sig->stats as being NULL when two
> threads belonging to the same tgid. We do free up stats if we got that wrong
>
> Am I misinterpreting the report?
>
> Balbir Singh.

The plain concurrent reads/writes are a data race, which may manifest
in various undefined behaviour due to compiler optimizations [1, 2].
Note that, "data race" does not necessarily imply "race condition";
some data races are race conditions (usually the more interesting
bugs) -- but not *all* data races are race conditions (sometimes
referred to as "benign races"). KCSAN reports data races according to
the LKMM.
[1] https://lwn.net/Articles/793253/
[2] https://lwn.net/Articles/799218/

If there is no race condition here that warrants heavier
synchronization (locking etc.), we still have the data race which
needs fixing by using marked atomic operations (READ_ONCE, WRITE_ONCE,
atomic_t, etc.). We also need to consider memory ordering requirements
(do we need smp_*mb(), smp_load_acquire/smp_store_release, ..)?

In the case here, the pattern is double-checked locking, which is
incorrect without atomic operations and the correct memory ordering.
There is a lengthy discussion here:
https://lore.kernel.org/lkml/20191021113327.22365-1-christian.brauner@ubuntu.com/

-- Marco