linux-kernel - Re: [PATCH] kcsan: Add option to allow watcher interruptions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNPhuvrhRHAiuv2Zju1VNSe7dO0aaYn+1TB99OF2Hv0S_A@mail.gmail.com>
Date:   Sat, 25 Jul 2020 17:17:43 +0200
From:   Marco Elver <elver@...gle.com>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     Andrey Konovalov <andreyknvl@...gle.com>,
        Alexander Potapenko <glider@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        kasan-dev <kasan-dev@...glegroups.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] kcsan: Add option to allow watcher interruptions

[+Peter]

On Sat, 25 Jul 2020 at 16:56, Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Thu, Feb 20, 2020 at 10:33:17PM +0100, Marco Elver wrote:
> > On Thu, 20 Feb 2020, Paul E. McKenney wrote:
>
> I am clearly not keeping up...  :-/

Not to worry, I think the local_t idea was discarded based on Peter's
feedback anyway at one point.

> > > On Thu, Feb 20, 2020 at 03:15:51PM +0100, Marco Elver wrote:
> > > > Add option to allow interrupts while a watchpoint is set up. This can be
> > > > enabled either via CONFIG_KCSAN_INTERRUPT_WATCHER or via the boot
> > > > parameter 'kcsan.interrupt_watcher=1'.
[...]
> > > > As an example, the first data race that this found:
> > > >
> > > > write to 0xffff88806b3324b8 of 4 bytes by interrupt on cpu 0:
> > > >  rcu_preempt_read_enter kernel/rcu/tree_plugin.h:353 [inline]
> > > >  __rcu_read_lock+0x3c/0x50 kernel/rcu/tree_plugin.h:373
[...]
> > > > read to 0xffff88806b3324b8 of 4 bytes by task 6131 on cpu 0:       |
> > > >  rcu_preempt_read_enter kernel/rcu/tree_plugin.h:353 [inline]  ----+
[...]
> > > >
> > > > The writer is doing 'current->rcu_read_lock_nesting++'. The read is as
> > > > vulnerable to compiler optimizations and would therefore conclude this
> > > > is a valid data race.
> > >
> > > Heh!  That one is a fun one!  It is on a very hot fastpath.  READ_ONCE()
> > > and WRITE_ONCE() are likely to be measurable at the system level.
> > >
> > > Thoughts on other options?
> >
> > Would this be a use-case for local_t? Don't think this_cpu ops work
> > here.
> >
> > See below idea. This would avoid the data race (KCSAN stopped
> > complaining) and seems to generate reasonable code.
> >
> > Version before:
> >
> >  <__rcu_read_lock>:
> >      130      mov    %gs:0x0,%rax
> >      137
> >      139      addl   $0x1,0x370(%rax)
> >      140      retq
> >      141      data16 nopw %cs:0x0(%rax,%rax,1)
> >      148
> >      14c      nopl   0x0(%rax)
> >
> > Version after:
> >
> >  <__rcu_read_lock>:
> >      130      mov    %gs:0x0,%rax
> >      137
> >      139      incq   0x370(%rax)
> >      140      retq
> >      141      data16 nopw %cs:0x0(%rax,%rax,1)
> >      148
> >      14c      nopl   0x0(%rax)
> >
> > I haven't checked the other places where it is used, though.
> > (Can send it as a patch if you think this might work.)
> >
> > Thanks,
> > -- Marco
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 2678a37c31696..3d8586ee7ae64 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -50,7 +50,7 @@ void __rcu_read_unlock(void);
> >   * nesting depth, but makes sense only if CONFIG_PREEMPT_RCU -- in other
> >   * types of kernel builds, the rcu_read_lock() nesting depth is unknowable.
> >   */
> > -#define rcu_preempt_depth() (current->rcu_read_lock_nesting)
> > +#define rcu_preempt_depth() local_read(&current->rcu_read_lock_nesting)
> >
> >  #else /* #ifdef CONFIG_PREEMPT_RCU */
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 0918904c939d2..70d7e3257feed 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -10,6 +10,7 @@
> >  #include <uapi/linux/sched.h>
> >
> >  #include <asm/current.h>
> > +#include <asm/local.h>
> >
> >  #include <linux/pid.h>
> >  #include <linux/sem.h>
> > @@ -708,7 +709,7 @@ struct task_struct {
> >       cpumask_t                       cpus_mask;
> >
> >  #ifdef CONFIG_PREEMPT_RCU
> > -     int                             rcu_read_lock_nesting;
> > +     local_t                         rcu_read_lock_nesting;
> >       union rcu_special               rcu_read_unlock_special;
> >       struct list_head                rcu_node_entry;
> >       struct rcu_node                 *rcu_blocked_node;
> > diff --git a/init/init_task.c b/init/init_task.c
> > index 096191d177d5c..941777fce11e5 100644
> > --- a/init/init_task.c
> > +++ b/init/init_task.c
> > @@ -130,7 +130,7 @@ struct task_struct init_task
> >       .perf_event_list = LIST_HEAD_INIT(init_task.perf_event_list),
> >  #endif
> >  #ifdef CONFIG_PREEMPT_RCU
> > -     .rcu_read_lock_nesting = 0,
> > +     .rcu_read_lock_nesting = LOCAL_INIT(0),
> >       .rcu_read_unlock_special.s = 0,
> >       .rcu_node_entry = LIST_HEAD_INIT(init_task.rcu_node_entry),
> >       .rcu_blocked_node = NULL,
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 60a1295f43843..43af326081b06 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -1669,7 +1669,7 @@ init_task_pid(struct task_struct *task, enum pid_type type, struct pid *pid)
> >  static inline void rcu_copy_process(struct task_struct *p)
> >  {
> >  #ifdef CONFIG_PREEMPT_RCU
> > -     p->rcu_read_lock_nesting = 0;
> > +     local_set(&p->rcu_read_lock_nesting, 0);
> >       p->rcu_read_unlock_special.s = 0;
> >       p->rcu_blocked_node = NULL;
> >       INIT_LIST_HEAD(&p->rcu_node_entry);
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index c6ea81cd41890..e0595abd50c0f 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -350,17 +350,17 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
> >
> >  static void rcu_preempt_read_enter(void)
> >  {
> > -     current->rcu_read_lock_nesting++;
> > +     local_inc(&current->rcu_read_lock_nesting);
> >  }
> >
> >  static void rcu_preempt_read_exit(void)
> >  {
> > -     current->rcu_read_lock_nesting--;
> > +     local_dec(&current->rcu_read_lock_nesting);
> >  }
> >
> >  static void rcu_preempt_depth_set(int val)
> >  {
> > -     current->rcu_read_lock_nesting = val;
> > +     local_set(&current->rcu_read_lock_nesting, val);

> I agree that this removes the data races, and that the code for x86 is
> quite nice, but aren't rcu_read_lock() and rcu_read_unlock() going to
> have heavyweight atomic operations on many CPUs?
>
> Maybe I am stuck with arch-specific code in rcu_read_lock() and
> rcu_preempt_read_exit().  I suppose worse things could happen.

Peter also mentioned to me that while local_t on x86 generates
reasonable code, on other architectures it's terrible. So I think
something else is needed, and feel free to discard the above idea.
With sufficient enough reasoning, how bad would a 'data_race(..)' be?

Thanks,
-- Marco