[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+b06RBSgcxooStVLoUVZRR=_L3Pxo6Ozp45s8brw1Ybfg@mail.gmail.com>
Date: Wed, 16 Apr 2025 12:52:12 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Alexander Potapenko <glider@...gle.com>
Cc: quic_jiangenj@...cinc.com, linux-kernel@...r.kernel.org,
kasan-dev@...glegroups.com, Aleksandr Nogikh <nogikh@...gle.com>,
Andrey Konovalov <andreyknvl@...il.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, Ingo Molnar <mingo@...hat.com>,
Josh Poimboeuf <jpoimboe@...nel.org>, Marco Elver <elver@...gle.com>,
Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 0/7] RFC: coverage deduplication for KCOV
On Wed, 16 Apr 2025 at 10:54, Alexander Potapenko <glider@...gle.com> wrote:
>
> As mentioned by Joey Jiao in [1], the current kcov implementation may
> suffer from certain syscalls overflowing the userspace coverage buffer.
>
> According to our measurements, among 24 syzkaller instances running
> upstream Linux, 5 had a coverage overflow in at least 50% of executed
> programs. The median percentage of programs with overflows across those 24
> instances was 8.8%.
>
> One way to mitigate this problem is to increase the size of the kcov buffer
> in the userspace application using kcov. But right now syzkaller already
> uses 4Mb per each of up to 32 threads to store the coverage, and increasing
> it further would result in reduction in the number of executors on a single
> machine. Replaying the same program with an increased buffer size in the
> case of overflow would also lead to fewer executions being possible.
>
> When executing a single system call, excessive coverage usually stems from
> loops, which write the same PCs into the output buffer repeatedly. Although
> collecting precise traces may give us some insights into e.g. the number of
> loop iterations and the branches being taken, the fuzzing engine does not
> take advantage of these signals, and recording only unique PCs should be
> just as practical.
>
> In [1] Joey Jiao suggested using a hash table to deduplicate the coverage
> signal on the kernel side. While being universally applicable to all types
> of data collected by kcov, this approach adds another layer of complexity,
> requiring dynamically growing the map. Another problem is potential hash
> collisions, which can as well lead to lost coverage. Hash maps are also
> unavoidably sparse, which potentially requires more memory.
The hashmap probably can compare values for equality to avoid losing
coverage, but the real problem is that it allocates and can't work in
interrupts, etc.
> The approach proposed in this patch series is to assign a unique (and
> almost) sequential ID to each of the coverage callbacks in the kernel. Then
> we carve out a fixed-sized bitmap from the userspace trace buffer, and on
> every callback invocation we:
>
> - obtain the callback_ID;
> - if bitmap[callback_ID] is set, append the PC to the trace buffer;
> - set bitmap[callback_ID] to true.
>
> LLVM's -fsanitize-coverage=trace-pc-guard replaces every coverage callback
> in the kernel with a call to
> __sanitizer_cov_trace_pc_guard(&guard_variable) , where guard_variable is a
> 4-byte global that is unique for the callsite.
>
> This allows us to lazily allocate sequential numbers just for the callbacks
> that have actually been executed, using a lock-free algorithm.
>
> This patch series implements a new config, CONFIG_KCOV_ENABLE_GUARDS, which
> utilizes the mentioned LLVM flag for coverage instrumentation. In addition
> to the existing coverage collection modes, it introduces
> ioctl(KCOV_UNIQUE_ENABLE), which splits the existing kcov buffer into the
> bitmap and the trace part for a particular fuzzing session, and collects
> only unique coverage in the trace buffer.
>
> To reset the coverage between runs, it is now necessary to set trace[0] to
> 0 AND clear the entire bitmap. This is still considered feasible, based on
> the experimental results below.
>
> The current design does not address the deduplication of KCOV_TRACE_CMP
> comparisons; however, the number of kcov overflows during the hints
> collection process is insignificant compared to the overflows of
> KCOV_TRACE_PC.
>
> In addition to the mentioned changes, this patch adds support for
> R_X86_64_REX_GOTPCRELX to objtool and arch/x86/kernel/module.c. It turned
> out that Clang leaves such relocations in the linked modules for the
> __start___sancov_guards and __stop___sancov_guards symbols. Because
> resolving them does not require a .got section, it can be done at module
> load time.
>
> Experimental results.
>
> We've conducted an experiment running syz-testbed [3] on 10 syzkaller
> instances for 24 hours. Out of those 10 instances, 5 were enabling the
> kcov_deduplicate flag from [4], which makes use of the KCOV_UNIQUE_ENABLE
> ioctl, reserving 4096 words (262144 bits) for the bitmap and leaving 520192
> words for the trace collection.
>
> Below are the average stats from the runs.
>
> kcov_deduplicate=false:
> corpus: 52176
> coverage: 302658
> cover overflows: 225288
> comps overflows: 491
> exec total: 1417829
> max signal: 318894
>
> kcov_deduplicate=true:
> corpus: 52581
> coverage: 304344
> cover overflows: 986
> comps overflows: 626
> exec total: 1484841
> max signal: 322455
>
> [1] https://lore.kernel.org/linux-arm-kernel/20250114-kcov-v1-5-004294b931a2@quicinc.com/T/
> [2] https://clang.llvm.org/docs/SanitizerCoverage.html
> [3] https://github.com/google/syzkaller/tree/master/tools/syz-testbed
> [4] https://github.com/ramosian-glider/linux/pull/7
>
>
> Alexander Potapenko (7):
> kcov: apply clang-format to kcov code
> kcov: factor out struct kcov_state
> kcov: x86: introduce CONFIG_KCOV_ENABLE_GUARDS
> kcov: add `trace` and `trace_size` to `struct kcov_state`
> kcov: add ioctl(KCOV_UNIQUE_ENABLE)
> x86: objtool: add support for R_X86_64_REX_GOTPCRELX
> mm/kasan: define __asan_before_dynamic_init, __asan_after_dynamic_init
>
> Documentation/dev-tools/kcov.rst | 43 +++
> MAINTAINERS | 1 +
> arch/x86/include/asm/elf.h | 1 +
> arch/x86/kernel/module.c | 8 +
> arch/x86/kernel/vmlinux.lds.S | 1 +
> arch/x86/um/asm/elf.h | 1 +
> include/asm-generic/vmlinux.lds.h | 14 +-
> include/linux/kcov-state.h | 46 +++
> include/linux/kcov.h | 60 ++--
> include/linux/sched.h | 16 +-
> include/uapi/linux/kcov.h | 1 +
> kernel/kcov.c | 453 +++++++++++++++++++-----------
> lib/Kconfig.debug | 16 ++
> mm/kasan/generic.c | 18 ++
> mm/kasan/kasan.h | 2 +
> scripts/Makefile.kcov | 4 +
> scripts/module.lds.S | 23 ++
> tools/objtool/arch/x86/decode.c | 1 +
> tools/objtool/check.c | 1 +
> 19 files changed, 508 insertions(+), 202 deletions(-)
> create mode 100644 include/linux/kcov-state.h
>
> --
> 2.49.0.604.gff1f9ca942-goog
>
Powered by blists - more mailing lists