[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNOsLeiD6hYXeD4g8fA=Ti6EiUsbtiv4VshRGg+oG1ct-g@mail.gmail.com>
Date: Mon, 16 Mar 2020 17:22:34 +0100
From: Marco Elver <elver@...gle.com>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
kasan-dev <kasan-dev@...glegroups.com>, kernel-team@...com,
Ingo Molnar <mingo@...nel.org>,
Andrey Konovalov <andreyknvl@...gle.com>,
Alexander Potapenko <glider@...gle.com>,
Dmitry Vyukov <dvyukov@...gle.com>, Qian Cai <cai@....pw>,
Boqun Feng <boqun.feng@...il.com>
Subject: Re: [PATCH kcsan 27/32] kcsan: Add option to allow watcher interruptions
On Mon, 16 Mar 2020 at 16:45, Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Mon, Mar 16, 2020 at 02:56:38PM +0100, Marco Elver wrote:
> > On Fri, 13 Mar 2020 at 16:28, Marco Elver <elver@...gle.com> wrote:
> > >
> > > On Thu, 12 Mar 2020 at 19:04, Paul E. McKenney <paulmck@...nel.org> wrote:
> > > >
> > > > On Thu, Mar 12, 2020 at 11:03:28AM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Mar 09, 2020 at 12:04:15PM -0700, paulmck@...nel.org wrote:
> > > > > > From: Marco Elver <elver@...gle.com>
> > > > > >
> > > > > > Add option to allow interrupts while a watchpoint is set up. This can be
> > > > > > enabled either via CONFIG_KCSAN_INTERRUPT_WATCHER or via the boot
> > > > > > parameter 'kcsan.interrupt_watcher=1'.
> > > > > >
> > > > > > Note that, currently not all safe per-CPU access primitives and patterns
> > > > > > are accounted for, which could result in false positives. For example,
> > > > > > asm-generic/percpu.h uses plain operations, which by default are
> > > > > > instrumented. On interrupts and subsequent accesses to the same
> > > > > > variable, KCSAN would currently report a data race with this option.
> > > > > >
> > > > > > Therefore, this option should currently remain disabled by default, but
> > > > > > may be enabled for specific test scenarios.
> > > > > >
> > > > > > To avoid new warnings, changes all uses of smp_processor_id() to use the
> > > > > > raw version (as already done in kcsan_found_watchpoint()). The exact SMP
> > > > > > processor id is for informational purposes in the report, and
> > > > > > correctness is not affected.
> > > > > >
> > > > > > Signed-off-by: Marco Elver <elver@...gle.com>
> > > > > > Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> > > > >
> > > > > And I get silent hangs that bisect to this patch when running the
> > > > > following rcutorture command, run in the kernel source tree on a
> > > > > 12-hardware-thread laptop:
> > > > >
> > > > > bash tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 12 --duration 10 --kconfig "CONFIG_DEBUG_INFO=y CONFIG_KCSAN=y CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 CONFIG_KCSAN_VERBOSE=y CONFIG_KCSAN_INTERRUPT_WATCHER=y" --configs TREE03
> > > > >
> > > > > It works fine on some (but not all) of the other rcutorture test
> > > > > scenarios. It fails on TREE01, TREE02, TREE03, TREE09. The common thread
> > > > > is that these are the TREE scenarios are all PREEMPT=y. So are RUDE01,
> > > > > SRCU-P, TASKS01, and TASKS03, but these scenarios are not hammering
> > > > > on Tree RCU, and thus have far less interrupt activity and the like.
> > > > > Given that it is an interrupt-related feature being added by this commit,
> > > > > this seems like expected (mis)behavior.
> > > > >
> > > > > Can you reproduce this? If not, are there any diagnostics I can add to
> > > > > my testing? Or a diagnostic patch I could apply?
> > >
> > > I think I can reproduce it. Let me debug some more, so far I haven't
> > > found anything yet.
> > >
> > > What I do know is that it's related to reporting. Turning kcsan_report
> > > into a noop makes the test run to completion.
> > >
> > > > I should hasten to add that this feature was quite helpful in recent work!
> > >
> > > Good to know. :-) We can probably keep this patch, since the default
> > > config doesn't turn this on. But I will try to see what's up with the
> > > hangs, and hopefully find a fix.
> >
> > So this one turned out to be quite interesting. We can get deadlocks
> > if we can set up multiple watchpoints per task in case it's
> > interrupted and the interrupt sets up another watchpoint, and there
> > are many concurrent races happening; because the other_info struct in
> > report.c may never be released if an interrupt blocks the consumer due
> > to waiting for other_info to become released.
>
> Been there, done that! ;-)
>
> > Give me another day or 2 to come up with a decent fix.
>
> My thought is to send a pull request for the commits up to but not
> including this patch, allowing ample development and testing time for
> the fix. My concern with sending this, even with a fix, is that any
> further bugs might cast a shadow on the whole series, further slowing
> acceptance into mainline.
>
> Fair enough?
That's fine. I think the features changes can stay on -rcu/kcsan-dev
for now, but the documentation updates don't depend on them.
If it'd be useful, the updated documentation could be moved before
this patch to -rcu/kcsan, so we'd have
kcsan: Add current->state to implicitly atomic accesses
kcsan: Add option for verbose reporting
kcsan: Add option to allow watcher interruptions
-- cut --
kcsan: Update API documentation in kcsan-checks.h
kcsan: Update Documentation/dev-tools/kcsan.rst
kcsan: Fix a typo in a comment
.. rest of series ..
Although I'm fine with either.
Thanks,
-- Marco
Powered by blists - more mailing lists