linux-kernel - Re: [PATCH -next] lib: disable KCSAN for XArray

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNOtsdxh3YLcF-pUMua9afWfhg5P_2ziRGSMuT8Gi0c5TA@mail.gmail.com>
Date:   Fri, 6 Mar 2020 14:38:39 +0100
From:   Marco Elver <elver@...gle.com>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     Matthew Wilcox <willy@...radead.org>, Qian Cai <cai@....pw>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH -next] lib: disable KCSAN for XArray

On Thu, 5 Mar 2020 at 22:39, Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Thu, Mar 05, 2020 at 07:18:31AM -0800, Matthew Wilcox wrote:
> > On Wed, Mar 04, 2020 at 06:10:21AM -0800, Paul E. McKenney wrote:
> > > On Tue, Mar 03, 2020 at 08:33:56PM -0800, Matthew Wilcox wrote:
> > > > On Tue, Mar 03, 2020 at 08:05:15PM -0800, Paul E. McKenney wrote:
> > > > > On Tue, Mar 03, 2020 at 07:33:29PM -0800, Matthew Wilcox wrote:
> > > > > > On Tue, Mar 03, 2020 at 10:15:51PM -0500, Qian Cai wrote:
> > > > > > > Functions like xas_find_marked(), xas_set_mark(), and xas_clear_mark()
> > > > > > > could happen concurrently result in data races, but those operate only
> > > > > > > on a single bit that are pretty much harmless. For example,
> > > > > >
> > > > > > Those aren't data races.  The writes are protected by a spinlock and the
> > > > > > reads by the RCU read lock.  If the tool can't handle RCU protection,
> > > > > > it's not going to be much use.
> > > > >
> > > > > Would KCSAN's ASSERT_EXCLUSIVE_BITS() help here?
> > > >
> > > > I'm quite lost in the sea of new macros that have been added to help
> > > > with KCSAN.  It doesn't help that they're in -next somewhere that I
> > > > can't find, and not in mainline yet.  Is there documentation somewhere?
> > >
> > > Yes, there is documentation.  In -next somewhere.  :-/
> >
> > Looking at the documentation link that Marco kindly provided, I don't
> > think ASSERT_EXCLUSIVE_BITS is the correct annotation to make.  The bits
> > in question are being accessed in parallel, it's just that we're willing
> > to accept a certain degree of inaccuracy.
> >
> > > > struct xa_node {
> > > >         unsigned char   shift;          /* Bits remaining in each slot */
> > > >         unsigned char   offset;         /* Slot offset in parent */
> > > >         unsigned char   count;          /* Total entry count */
> > > >         unsigned char   nr_values;      /* Value entry count */
> > > >         struct xa_node __rcu *parent;   /* NULL at top of tree */
> > > >         struct xarray   *array;         /* The array we belong to */
> > > >         union {
> > > >                 struct list_head private_list;  /* For tree user */
> > > >                 struct rcu_head rcu_head;       /* Used when freeing node */
> > > >         };
> > > >         void __rcu      *slots[XA_CHUNK_SIZE];
> > > >         union {
> > > >                 unsigned long   tags[XA_MAX_MARKS][XA_MARK_LONGS];
> > > >                 unsigned long   marks[XA_MAX_MARKS][XA_MARK_LONGS];
> > > >         };
> > > > };
> > > >
> > > > 'slots' may be modified with the xa_lock
> > > > held, and simultaneously read by an RCU reader.  Ditto 'tags'/'marks'.
> > >
> > > KCSAN expects that accesses to the ->parent field should be marked.
> > > But if ->parent is always accessed via things like rcu_dereference()
> > > and rcu_assign_pointer() (guessing based on the __rcu), then KCSAN
> > > won't complain.
> > [...]
> > > The situation with ->slots is the same as that for ->parent.
> > >
> > > KCSAN expects accesses to the ->tags[] and ->marks[] arrays to be marked.
> > > However, the default configuration of KCSAN asks only that the reads
> > > be marked.  (Within RCU, I instead configure KCSAN so that it asks that
> > > both be marked, but it is of course your choice within your code.)
> > >
> > > > The RCU readers are prepared for what they see to be inconsistent --
> > > > a fact of life when dealing with RCU!  So in a sense, yes, there is a
> > > > race there.  But it's a known, accepted race, and that acceptance is
> > > > indicated by the fact that the RCU lock is held.  Does there need to be
> > > > more annotation here?  Or is there an un-noticed bug that the tool is
> > > > legitimately pointing out?
> > >
> > > The answer to both questions is "maybe", depending on the code using
> > > the values read.  Yes, it would be nice if KCSAN could figure out the
> > > difference, but there are limits to what a tool can do.  And things
> > > are sometimes no-obvious, for example:
> >
> > I require the following properties from this array of bits:
> >
> >  - If a bit was set before and after the modification, it must be seen to
> >    be set.
> >  - If a bit was clear before and after the modification, it must be seen to
> >    be clear.
> >  - If a bit is modified, it may be seen as set or clear.
> >
> > I have found three locations where we use the ->marks array:
> >
> > 1.
> >                         unsigned long data = *addr & (~0UL << offset);
> >                         if (data)
> >                                 return __ffs(data);
> >
> > 2.
> >         return find_next_bit(addr, XA_CHUNK_SIZE, offset);
> > 3.
> >         return test_bit(offset, node_marks(node, mark));
> >
> > The modifications -- all done with the spinlock held -- use the non-atomic
> > bitops:
> >         return __test_and_set_bit(offset, node_marks(node, mark));
> >         return __test_and_clear_bit(offset, node_marks(node, mark));
> >         bitmap_fill(node_marks(node, mark), XA_CHUNK_SIZE);
> > (that last one doesn't really count -- it's done prior to placing the node
> > in the tree)
> >
> > The first read seems straightforward; I can place a READ_ONCE around
> > *addr.  The second & third reads are rather less straightforward.
> > find_next_bit() and test_bit() are common code and use plain loads today.
>
> Yes, those last two are a bit annoying, aren't they?  I guess the first
> thing would be placing READ_ONCE() inside them, and if that results in
> regressions, have an alternative API for concurrent access?

FWIW test_bit() is an "atomic" bitop (per atomic_bitops.txt), and
KCSAN treats it as such. On x86 arch_test_bit() is not instrumented,
and then in asm-generic/bitops/instrumented-non-atomic.h test_bit() is
instrumented with instrument_atomic_read(). So on x86, things should
already be fine for test_bit(). Not sure about other architectures.

Thanks,
-- Marco



> > > Your mileage may vary, of course.  For one thing, your actuarial
> > > statistics are quite likley significantly more favorable than are mine.
> > > Not that mine are at all bad, particularly by the standards of a century
> > > or two ago.  ;-)
> >
> > I don't know ... my preferred form of exercise takes me into the
> > harm's way of cars, so while I've significantly lowered my risk of
> > dying of cardiovascular disease, I'm much more likely to be killed by
> > an inattentive human being!  Bring on our robot cars ...
>
> I did my 40 years using that mode of transport.  But my reflexes are
> no longer up to the task.  So I have no advice for you on that one!
> Other than to question your trust in robot cars whose code was produced
> by human developers!  ;-)
>
>                                                         Thanx, Paul