linux-kernel - Re: Kernel Concurrency Sanitizer (KCSAN)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNMBehv0UUuEko-F-ygegX+YS+Km3ggFB0tnBoCpRRXhSw@mail.gmail.com>
Date:   Thu, 3 Oct 2019 21:27:08 +0200
From:   Marco Elver <elver@...gle.com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     Dmitry Vyukov <dvyukov@...gle.com>,
        kasan-dev <kasan-dev@...glegroups.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Andrey Konovalov <andreyknvl@...gle.com>,
        Alexander Potapenko <glider@...gle.com>,
        "Paul E. McKenney" <paulmck@...ux.ibm.com>,
        Paul Turner <pjt@...gle.com>, Daniel Axtens <dja@...ens.net>,
        Anatol Pomazau <anatol@...gle.com>,
        Will Deacon <willdeacon@...gle.com>,
        Andrea Parri <parri.andrea@...il.com>,
        Alan Stern <stern@...land.harvard.edu>,
        LKMM Maintainers -- Akira Yokosawa <akiyks@...il.com>,
        Nicholas Piggin <npiggin@...il.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Daniel Lustig <dlustig@...dia.com>,
        Jade Alglave <j.alglave@....ac.uk>,
        Luc Maranget <luc.maranget@...ia.fr>
Subject: Re: Kernel Concurrency Sanitizer (KCSAN)

On Thu, 3 Oct 2019 at 18:12, Mark Rutland <mark.rutland@....com> wrote:
>
> On Fri, Sep 20, 2019 at 07:51:04PM +0200, Marco Elver wrote:
> > On Fri, 20 Sep 2019 at 18:47, Dmitry Vyukov <dvyukov@...gle.com> wrote:
> > >
> > > On Fri, Sep 20, 2019 at 6:31 PM Mark Rutland <mark.rutland@....com> wrote:
> > > >
> > > > On Fri, Sep 20, 2019 at 04:18:57PM +0200, Marco Elver wrote:
> > > > > We would like to share a new data-race detector for the Linux kernel:
> > > > > Kernel Concurrency Sanitizer (KCSAN) --
> > > > > https://github.com/google/ktsan/wiki/KCSAN  (Details:
> > > > > https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)
> > > >
> > > > Nice!
> > > >
> > > > BTW kcsan_atomic_next() is missing a stub definition in <linux/kcsan.h>
> > > > when !CONFIG_KCSAN:
> > > >
> > > > https://github.com/google/ktsan/commit/a22a093a0f0d0b582c82cdbac4f133a3f61d207c#diff-19d7c475b4b92aab8ba440415ab786ec
> > > >
> > > > ... and I think the kcsan_{begin,end}_atomic() stubs need to be static
> > > > inline too.
> >
> > Thanks for catching, fixed and pushed. Feel free to rebase your arm64 branch.
>
> Great; I've just done so!
>
> What's the plan for posting a PATCH or RFC series?

I'm planning to send some patches, but with the amount of data-races
being found I need to prioritize what we send first. Currently the
plan is to let syzbot find data-races, and we'll start by sending a
few critical reports that syzbot found. Syzbot should be set up fully
and start finding data-races within next few days.

> The rest of this email is rabbit-holing on the issue KCSAN spotted;
> sorry about that!

Thanks for looking into this! I think you're right, and please do feel
free to send a proper patch out.

Thanks,
-- Marco

> [...]
>
> > > > We have some interesting splats at boot time in stop_machine, which
> > > > don't seem to have been hit/fixed on x86 yet in the kcsan-with-fixes
> > > > branch, e.g.
> > > >
> > > > [    0.237939] ==================================================================
> > > > [    0.239431] BUG: KCSAN: data-race in multi_cpu_stop+0xa8/0x198 and set_state+0x80/0xb0
> > > > [    0.241189]
> > > > [    0.241606] write to 0xffff00001003bd00 of 4 bytes by task 24 on cpu 3:
> > > > [    0.243435]  set_state+0x80/0xb0
> > > > [    0.244328]  multi_cpu_stop+0x16c/0x198
> > > > [    0.245406]  cpu_stopper_thread+0x170/0x298
> > > > [    0.246565]  smpboot_thread_fn+0x40c/0x560
> > > > [    0.247696]  kthread+0x1a8/0x1b0
> > > > [    0.248586]  ret_from_fork+0x10/0x18
> > > > [    0.249589]
> > > > [    0.250006] read to 0xffff00001003bd00 of 4 bytes by task 14 on cpu 1:
> > > > [    0.251804]  multi_cpu_stop+0xa8/0x198
> > > > [    0.252851]  cpu_stopper_thread+0x170/0x298
> > > > [    0.254008]  smpboot_thread_fn+0x40c/0x560
> > > > [    0.255135]  kthread+0x1a8/0x1b0
> > > > [    0.256027]  ret_from_fork+0x10/0x18
> > > > [    0.257036]
> > > > [    0.257449] Reported by Kernel Concurrency Sanitizer on:
> > > > [    0.258918] CPU: 1 PID: 14 Comm: migration/1 Not tainted 5.3.0-00007-g67ab35a199f4-dirty #3
> > > > [    0.261241] Hardware name: linux,dummy-virt (DT)
> > > > [    0.262517] ==================================================================>
> >
> > Thanks, the fixes in -with-fixes were ones I only encountered with
> > Syzkaller, where I disable KCSAN during boot. I've just added a fix
> > for this race and pushed to kcsan-with-fixes.
>
> I think that's:
>
>   https://github.com/google/ktsan/commit/c1bc8ab013a66919d8347c2392f320feabb14f92
>
> ... but that doesn't look quite right to me, as it leaves us with the shape:
>
>         do {
>                 if (READ_ONCE(msdata->state) != curstate) {
>                         curstate = msdata->state;
>                         switch (curstate) {
>                                 ...
>                         }
>                         ack_state(msdata);
>                 }
>         } while (curstate != MULTI_STOP_EXIT);
>
> I don't believe that we have a guarantee of read-after-read ordering
> between the READ_ONCE(msdata->state) and the subsequent plain access of
> msdata->state, as we've been caught out on that in the past, e.g.
>
>   https://lore.kernel.org/lkml/1506527369-19535-1-git-send-email-will.deacon@arm.com/
>
> ... which I think means we could switch on a stale value of
> msdata->state. That would mean we might handle the same state twice,
> calling ack_state() more times than expected and corrupting the count.
>
> The compiler could also replace uses of curstate with a reload of
> msdata->state. If it did so for the while condition, we could skip the
> expected ack_state() for MULTI_STOP_EXIT, though it looks like that
> might not matter.
>
> I think we need to make sure that we use a consistent snapshot,
> something like the below. Assuming I'm not barking up the wrong tree, I
> can spin this as a proper patch.
>
> Thanks,
> Mark.
>
> ---->8----
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index b4f83f7bdf86..67a0b454b5b5 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -167,7 +167,7 @@ static void set_state(struct multi_stop_data *msdata,
>         /* Reset ack counter. */
>         atomic_set(&msdata->thread_ack, msdata->num_threads);
>         smp_wmb();
> -       msdata->state = newstate;
> +       WRITE_ONCE(msdata->state, newstate);
>  }
>
>  /* Last one to ack a state moves to the next state. */
> @@ -186,7 +186,7 @@ void __weak stop_machine_yield(const struct cpumask *cpumask)
>  static int multi_cpu_stop(void *data)
>  {
>         struct multi_stop_data *msdata = data;
> -       enum multi_stop_state curstate = MULTI_STOP_NONE;
> +       enum multi_stop_state newstate, curstate = MULTI_STOP_NONE;
>         int cpu = smp_processor_id(), err = 0;
>         const struct cpumask *cpumask;
>         unsigned long flags;
> @@ -210,8 +210,9 @@ static int multi_cpu_stop(void *data)
>         do {
>                 /* Chill out and ensure we re-read multi_stop_state. */
>                 stop_machine_yield(cpumask);
> -               if (msdata->state != curstate) {
> -                       curstate = msdata->state;
> +               newstate = READ_ONCE(msdata->state);
> +               if (newstate != curstate) {
> +                       curstate = newstate;
>                         switch (curstate) {
>                         case MULTI_STOP_DISABLE_IRQ:
>                                 local_irq_disable();
>