[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNO=jGNNd4J0hBhz4ORLdw_+EHQDvyoQRikRCOsuMAcXYg@mail.gmail.com>
Date: Mon, 16 Mar 2020 14:56:38 +0100
From: Marco Elver <elver@...gle.com>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
kasan-dev <kasan-dev@...glegroups.com>, kernel-team@...com,
Ingo Molnar <mingo@...nel.org>,
Andrey Konovalov <andreyknvl@...gle.com>,
Alexander Potapenko <glider@...gle.com>,
Dmitry Vyukov <dvyukov@...gle.com>, Qian Cai <cai@....pw>,
Boqun Feng <boqun.feng@...il.com>
Subject: Re: [PATCH kcsan 27/32] kcsan: Add option to allow watcher interruptions
On Fri, 13 Mar 2020 at 16:28, Marco Elver <elver@...gle.com> wrote:
>
> On Thu, 12 Mar 2020 at 19:04, Paul E. McKenney <paulmck@...nel.org> wrote:
> >
> > On Thu, Mar 12, 2020 at 11:03:28AM -0700, Paul E. McKenney wrote:
> > > On Mon, Mar 09, 2020 at 12:04:15PM -0700, paulmck@...nel.org wrote:
> > > > From: Marco Elver <elver@...gle.com>
> > > >
> > > > Add option to allow interrupts while a watchpoint is set up. This can be
> > > > enabled either via CONFIG_KCSAN_INTERRUPT_WATCHER or via the boot
> > > > parameter 'kcsan.interrupt_watcher=1'.
> > > >
> > > > Note that, currently not all safe per-CPU access primitives and patterns
> > > > are accounted for, which could result in false positives. For example,
> > > > asm-generic/percpu.h uses plain operations, which by default are
> > > > instrumented. On interrupts and subsequent accesses to the same
> > > > variable, KCSAN would currently report a data race with this option.
> > > >
> > > > Therefore, this option should currently remain disabled by default, but
> > > > may be enabled for specific test scenarios.
> > > >
> > > > To avoid new warnings, changes all uses of smp_processor_id() to use the
> > > > raw version (as already done in kcsan_found_watchpoint()). The exact SMP
> > > > processor id is for informational purposes in the report, and
> > > > correctness is not affected.
> > > >
> > > > Signed-off-by: Marco Elver <elver@...gle.com>
> > > > Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> > >
> > > And I get silent hangs that bisect to this patch when running the
> > > following rcutorture command, run in the kernel source tree on a
> > > 12-hardware-thread laptop:
> > >
> > > bash tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 12 --duration 10 --kconfig "CONFIG_DEBUG_INFO=y CONFIG_KCSAN=y CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 CONFIG_KCSAN_VERBOSE=y CONFIG_KCSAN_INTERRUPT_WATCHER=y" --configs TREE03
> > >
> > > It works fine on some (but not all) of the other rcutorture test
> > > scenarios. It fails on TREE01, TREE02, TREE03, TREE09. The common thread
> > > is that these are the TREE scenarios are all PREEMPT=y. So are RUDE01,
> > > SRCU-P, TASKS01, and TASKS03, but these scenarios are not hammering
> > > on Tree RCU, and thus have far less interrupt activity and the like.
> > > Given that it is an interrupt-related feature being added by this commit,
> > > this seems like expected (mis)behavior.
> > >
> > > Can you reproduce this? If not, are there any diagnostics I can add to
> > > my testing? Or a diagnostic patch I could apply?
>
> I think I can reproduce it. Let me debug some more, so far I haven't
> found anything yet.
>
> What I do know is that it's related to reporting. Turning kcsan_report
> into a noop makes the test run to completion.
>
> > I should hasten to add that this feature was quite helpful in recent work!
>
> Good to know. :-) We can probably keep this patch, since the default
> config doesn't turn this on. But I will try to see what's up with the
> hangs, and hopefully find a fix.
So this one turned out to be quite interesting. We can get deadlocks
if we can set up multiple watchpoints per task in case it's
interrupted and the interrupt sets up another watchpoint, and there
are many concurrent races happening; because the other_info struct in
report.c may never be released if an interrupt blocks the consumer due
to waiting for other_info to become released.
Give me another day or 2 to come up with a decent fix.
Thanks,
-- Marco
Powered by blists - more mailing lists