linux-kernel - Re: [PATCH 2/4] kcov: add interrupt handling self test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+bT+D41NopXN4rFbxqgC--mqYL=xCQOje0jDssjncKY+A@mail.gmail.com>
Date: Wed, 5 Jun 2024 11:18:21 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Marco Elver <elver@...gle.com>
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, 
	dave.hansen@...ux.intel.com, x86@...nel.org, linux-kernel@...r.kernel.org, 
	syzkaller@...glegroups.com, glider@...gle.com, nogikh@...gle.com, 
	tarasmadan@...gle.com
Subject: Re: [PATCH 2/4] kcov: add interrupt handling self test

On Wed, 5 Jun 2024 at 11:10, Marco Elver <elver@...gle.com> wrote:
>
> > Add a boot self test that can catch sprious coverage from interrupts.
> > The coverage callback filters out interrupt code, but only after the
> > handler updates preempt count. Some code periodically leaks out
> > of that section and leads to spurious coverage.
> > Add a best-effort (but simple) test that is likely to catch such bugs.
> > If the test is enabled on CI systems that use KCOV, they should catch
> > any issues fast.
> >
> > Signed-off-by: Dmitry Vyukov <dvyukov@...gle.com>
> > Cc: x86@...nel.org
> > Cc: linux-kernel@...r.kernel.org
> > Cc: syzkaller@...glegroups.com
> >
> > ---
> >
> > In my local testing w/o the previous fix,
> > it immidiatly produced the following splat:
> >
> > kcov: running selftest
> > BUG: TASK stack guard page was hit at ffffc90000147ff8
> > Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN PTI
> > ...
> >  kvm_set_cpu_l1tf_flush_l1d+0x5/0x20
> >  sysvec_call_function+0x15/0xb0
> >  asm_sysvec_call_function+0x1a/0x20
> >  kcov_init+0xe4/0x130
> >  do_one_initcall+0xbc/0x470
> >  kernel_init_freeable+0x4fc/0x930
> >  kernel_init+0x1c/0x2b0
> > ---
> >  kernel/kcov.c     | 28 ++++++++++++++++++++++++++++
> >  lib/Kconfig.debug |  9 +++++++++
> >  2 files changed, 37 insertions(+)
> >
> > diff --git a/kernel/kcov.c b/kernel/kcov.c
> > index c3124f6d5536..04136f80042f 100644
> > --- a/kernel/kcov.c
> > +++ b/kernel/kcov.c
> > @@ -1057,6 +1057,30 @@ u64 kcov_common_handle(void)
> >  }
> >  EXPORT_SYMBOL(kcov_common_handle);
> >
> > +#ifdef CONFIG_KCOV_TEST
> > +static void __init selftest(void)
> > +{
> > +       volatile int i;
> > +
> > +       pr_err("running self test\n");
> > +       /*
> > +        * Test that interrupts don't produce spurious coverage.
> > +        * The coverage callback filters out interrupt code, but only
> > +        * after the handler updates preempt count. Some code periodically
> > +        * leaks out of that section and leads to spurious coverage.
> > +        * It's hard to call the actual interrupt handler directly,
> > +        * so we just loop here for ~400 ms waiting for a timer interrupt.
>
> Where do the 400 ms come from? I only see that it loops a long time,
> but that the timing is entirely dependent on how fast the CPU executes
> the loop.
>
> > +        * We set kcov_mode to enable tracing, but don't setup the area,
> > +        * so any attempt to trace will crash.
> > +        */
> > +       current->kcov_mode = KCOV_MODE_TRACE_PC;
> > +       for (i = 0; i < (1 << 28); i++)
> > +               ;
>
> Can't you check jiffies, and e.g. check that actual ~100-500ms have elapsed?
>
> timeout = jiffies + msecs_to_jiffies(300);
> while (!time_after(jiffies, timeout)) {
>   cpu_relax();
> }

We can't call any functions. If anything is instrumented, the kernel crashes.

But just reading jiffies should be fine, so we can do:

unsigned long start = jiffies;
while ((jiffies - start) * MSEC_PER_SEC / HZ < 500)
;

> > +       current->kcov_mode = 0;
> > +       pr_err("done running self test\n");
> > +}
> > +#endif
> > +
> >  static int __init kcov_init(void)
> >  {
> >         int cpu;
> > @@ -1076,6 +1100,10 @@ static int __init kcov_init(void)
> >          */
> >         debugfs_create_file_unsafe("kcov", 0600, NULL, NULL, &kcov_fops);
> >
> > +#ifdef CONFIG_KCOV_TEST
> > +       selftest();
> > +#endif
> > +
> >         return 0;
> >  }
> >
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 59b6765d86b8..79836a15b6cb 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -2171,6 +2171,15 @@ config KCOV_IRQ_AREA_SIZE
> >           soft interrupts. This specifies the size of those areas in the
> >           number of unsigned long words.
> >
> > +config KCOV_TEST
>
> s/TEST/SELFTEST/
>
> It may be confused with a longer standalone test.
>
> > +       bool "Test CONFIG_KCOV feature"
>
> Maybe "Perform short selftests on boot" (similar to CONFIG_KCSAN_SELFTEST).
>
> > +       depends on KCOV
> > +       help
> > +         Sanity check for KCOV coverage collection.
> > +         Runs built-in self test on boot to detect some common issues.
> > +
> > +         If unsure, say N.
> > +
> >  menuconfig RUNTIME_TESTING_MENU
> >         bool "Runtime Testing"
> >         default y
> > --
> > 2.45.1.467.gbab1589fc0-goog
> >