linux-kernel - Re: [PATCH v2 2/4] kcov: add interrupt handling self test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240619112625.GM31592@noisy.programming.kicks-ass.net>
Date: Wed, 19 Jun 2024 13:26:25 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
	dave.hansen@...ux.intel.com, x86@...nel.org,
	linux-kernel@...r.kernel.org, syzkaller@...glegroups.com,
	elver@...gle.com, glider@...gle.com, nogikh@...gle.com,
	tarasmadan@...gle.com
Subject: Re: [PATCH v2 2/4] kcov: add interrupt handling self test

On Wed, Jun 19, 2024 at 01:18:52PM +0200, Dmitry Vyukov wrote:
> On Wed, 19 Jun 2024 at 13:13, Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > On Tue, Jun 11, 2024 at 09:50:31AM +0200, Dmitry Vyukov wrote:
> > > Add a boot self test that can catch sprious coverage from interrupts.
> > > The coverage callback filters out interrupt code, but only after the
> > > handler updates preempt count. Some code periodically leaks out
> > > of that section and leads to spurious coverage.
> > > Add a best-effort (but simple) test that is likely to catch such bugs.
> > > If the test is enabled on CI systems that use KCOV, they should catch
> > > any issues fast.
> > >
> > > Signed-off-by: Dmitry Vyukov <dvyukov@...gle.com>
> > > Reviewed-by: Alexander Potapenko <glider@...gle.com>
> > > Cc: x86@...nel.org
> > > Cc: linux-kernel@...r.kernel.org
> > > Cc: syzkaller@...glegroups.com
> > >
> > > ---
> > >
> > > Changed since v1:
> > >  - renamed KCOV_TEST to KCOV_SELFTEST
> > >  - improved the config description
> > >  - loop for exactly 300ms in the test
> > >
> > > In my local testing w/o the previous fix,
> > > it immidiatly produced the following splat:
> > >
> > > kcov: running selftest
> > > BUG: TASK stack guard page was hit at ffffc90000147ff8
> > > Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN PTI
> > > ...
> > >  kvm_set_cpu_l1tf_flush_l1d+0x5/0x20
> > >  sysvec_call_function+0x15/0xb0
> > >  asm_sysvec_call_function+0x1a/0x20
> > >  kcov_init+0xe4/0x130
> > >  do_one_initcall+0xbc/0x470
> > >  kernel_init_freeable+0x4fc/0x930
> > >  kernel_init+0x1c/0x2b0
> >
> > So I'm not entirely sure how the above BUG comes about, nor how this
> > selftest tickles it. Could you elaborate?
> >
> > I've found check_kcov_mode() which has this !in_task() clause, but I'm
> > not entirely sure how failing that leads to the above mentioned failure.
> 
> I've tried to explain it in the test comment, maybe I need to improve it:
> 
> +        * We set kcov_mode to enable tracing, but don't setup the area,
> +        * so any attempt to trace will crash. Note: we must not call any
> +        * potentially traced functions in this region.

Ah, I'm just slow today.. did not connect the dots. No this is fine.

> Basically, we setup current task kcov in a way that any attempt to
> trace in __sanitizer_cov_trace_pc() will crash, and then just loop
> waiting for interrupts.
> 
> A more legit way to achieve the same would be to properly setup kcov
> for tracing from within the kernel, then call outermost interrupt
> entry function, then check we traced nothing. But that would require a
> non-trivial amount of new complex code, and e.g. the top-most
> interrupt entry function is not exported and is arch-specific.

Yeah, polling jiffies should be fine I suppose.