[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACT4Y+Zm5Vz1LL7m_BubwV=bMPgVjOVNpp12nDZRi5oesH47WA@mail.gmail.com>
Date: Fri, 10 Jan 2025 13:16:27 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Marco Elver <elver@...gle.com>
Cc: Joey Jiao <quic_jiangenj@...cinc.com>, andreyknvl@...il.com, corbet@....net,
akpm@...ux-foundation.org, gregkh@...uxfoundation.org, nogikh@...gle.com,
pierre.gondois@....com, cmllamas@...gle.com, quic_zijuhu@...cinc.com,
richard.weiyang@...il.com, tglx@...utronix.de, arnd@...db.de,
catalin.marinas@....com, will@...nel.org, dennis@...nel.org, tj@...nel.org,
cl@...ux.com, ruanjinjie@...wei.com, colyli@...e.de,
andriy.shevchenko@...ux.intel.com, kernel@...cinc.com,
quic_likaid@...cinc.com, kasan-dev@...glegroups.com,
workflows@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-mm@...ck.org
Subject: Re: [PATCH] kcov: add unique cover, edge, and cmp modes
On Fri, 10 Jan 2025 at 10:23, Marco Elver <elver@...gle.com> wrote:
> > From: "Jiao, Joey" <quic_jiangenj@...cinc.com>
> >
> > The current design of KCOV risks frequent buffer overflows. To mitigate
> > this, new modes are introduced: KCOV_TRACE_UNIQ_PC, KCOV_TRACE_UNIQ_EDGE,
> > and KCOV_TRACE_UNIQ_CMP. These modes allow for the recording of unique
> > PCs, edges, and comparison operands (CMP).
>
> There ought to be a cover letter explaining the motivation for this,
> and explaining why the new modes would help. Ultimately, what are you
> using KCOV for where you encountered this problem?
>
> > Key changes include:
> > - KCOV_TRACE_UNIQ_[PC|EDGE] can be used together to replace KCOV_TRACE_PC.
> > - KCOV_TRACE_UNIQ_CMP can be used to replace KCOV_TRACE_CMP mode.
> > - Introduction of hashmaps to store unique coverage data.
> > - Pre-allocated entries in kcov_map_init during KCOV_INIT_TRACE to avoid
> > performance issues with kmalloc.
> > - New structs and functions for managing memory and unique coverage data.
> > - Example program demonstrating the usage of the new modes.
>
> This should be a patch series, carefully splitting each change into a
> separate patch.
> https://docs.kernel.org/process/submitting-patches.html#split-changes
>
> > With the new hashmap and pre-alloced memory pool added, cover size can't
> > be set to higher value like 1MB in KCOV_TRACE_PC or KCOV_TRACE_CMP modes
> > in 2GB device with 8 procs, otherwise it causes frequent oom.
> >
> > For KCOV_TRACE_UNIQ_[PC|EDGE|CMP] modes, smaller cover size like 8KB can
> > be used.
> >
> > Signed-off-by: Jiao, Joey <quic_jiangenj@...cinc.com>
>
> As-is it's hard to review, and the motivation is unclear. A lot of
> code was moved and changed, and reviewers need to understand why that
> was done besides your brief explanation above.
>
> Generally, KCOV has very tricky constraints, due to being callable
> from any context, including NMI. This means adding new dependencies
> need to be carefully reviewed. For one, we can see this in genalloc's
> header:
>
> > * The lockless operation only works if there is enough memory
> > * available. If new memory is added to the pool a lock has to be
> > * still taken. So any user relying on locklessness has to ensure
> > * that sufficient memory is preallocated.
> > *
> > * The basic atomic operation of this allocator is cmpxchg on long.
> > * On architectures that don't have NMI-safe cmpxchg implementation,
> > * the allocator can NOT be used in NMI handler. So code uses the
> > * allocator in NMI handler should depend on
> > * CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
>
> And you are calling gen_pool_alloc() from __sanitizer_cov_trace_pc.
> Which means this implementation is likely broken on
> !CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG architectures (do we have
> architectures like that, that support KCOV?).
>
> There are probably other sharp corners due to the contexts KCOV can
> run in, but would simply ask you to carefully reason about why each
> new dependency is safe.
I am also concerned about the performance effect. Does it add a stack
frame to __sanitizer_cov_trace_pc()? Please show disassm of the
function before/after.
Also, I have concerns about interrupts and reentrancy. We are still
getting some reentrant calls from interrupts (not all of them are
filtered by in_task() check). I am afraid these complex hashmaps will
corrupt.
Powered by blists - more mailing lists