linux-kernel - Functional Coverage via RV? (was: "Learning-based Controlled Concurrency Testing")

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNMPvAucMQoZeLQAP_WiwiLT6XBoss=EZ4xAbrHnMwdt5g@mail.gmail.com>
Date:   Thu, 17 Jun 2021 13:20:04 +0200
From:   Marco Elver <elver@...gle.com>
To:     "Paul E. McKenney" <paulmck@...nel.org>, bristot@...hat.com
Cc:     Dmitry Vyukov <dvyukov@...gle.com>,
        syzkaller <syzkaller@...glegroups.com>,
        kasan-dev <kasan-dev@...glegroups.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Functional Coverage via RV? (was: "Learning-based Controlled
 Concurrency Testing")

[+Daniel, just FYI. We had a discussion about "functional coverage"
and fuzzing, and I've just seen your wonderful work on RV. If you have
thought about fuzzing with RV and how coverage of the model impacts
test generation, I'd be curious to hear.]

Looks like there is ongoing work on specifying models and running them
along with the kernel: https://lwn.net/Articles/857862/

Those models that are run alongside the kernel would have their own
coverage, and since there's a mapping between real code and model, a
fuzzer trying to reach new code in one or the other will ultimately
improve coverage for both.

Just wanted to document this here, because it seems quite relevant.
I'm guessing that "functional coverage" would indeed be a side-effect
of a good RV model?

Previous discussion below.

Thanks,
-- Marco

On Wed, 19 May 2021 at 22:24, Marco Elver <elver@...gle.com> wrote:
> On Wed, 19 May 2021 at 20:53, Paul E. McKenney <paulmck@...nel.org> wrote:
> > On Wed, May 19, 2021 at 11:02:43AM +0200, Marco Elver wrote:
> > > On Tue, 18 May 2021 at 22:42, Paul E. McKenney <paulmck@...nel.org> wrote:
> > > [...]
> > > > > All the above sound like "functional coverage" to me, and could be
> > > > > implemented on top of a well-thought-out functional coverage API.
> > > > > Functional coverage is common in the hardware verification space to
> > > > > drive simulation and model checking; for example, functional coverage
> > > > > could be "buffer is full" vs just structural (code) coverage which
> > > > > cannot capture complex state properties like that easily.
> > > > >
> > > > > Similarly, you could then say things like "number of held locks" or
> > > > > even alluding to your example (5) above, "observed race on address
> > > > > range". In the end, with decent functional coverage abstractions,
> > > > > anything should hopefully be possible.
> > > >
> > > > Those were in fact the lines along which I was thinking.
> > > >
> > > > > I've been wondering if this could be something useful for the Linux
> > > > > kernel, but my guess has always been that it'd not be too-well
> > > > > received because people don't like to see strange annotations in their
> > > > > code. But maybe I'm wrong.
> > > >
> > > > I agree that it is much easier to get people to use a tool that does not
> > > > require annotations.  In fact, it is best if it requires nothing at all
> > > > from them...
> > >
> > > While I'd like to see something like that, because it'd be beneficial
> > > to see properties of the code written down to document its behaviour
> > > better and at the same time machine checkable, like you say, if it
> > > requires additional effort, it's a difficult sell. (Although the same
> > > is true for all other efforts to improve reliability that require a
> > > departure from the "way it used to be done", be it data_race(), or
> > > even efforts introducing whole new programming languages to the
> > > kernel.)
> >
> > Fair point!  But what exactly did you have in mind?
>
> Good question, I'll try to be more concrete -- most of it are
> half-baked ideas and questions ;-), but if any of it makes sense, I
> should maybe write a doc to summarize.
>
> What I had in mind is a system to write properties for both functional
> coverage, but also checking more general properties of the kernel. The
> latter I'm not sure about how useful. But all this isn't really used
> for anything other than in debug builds.
>
> Assume we start with macros such as "ASSERT_COVER(...)" (for
> functional coverage) and "ASSERT(...)" (just plain-old assertions).
> The former is a way to document potentially interesting states (useful
> for fuzzers to reach them), and the latter just a way to just specify
> properties of the system (useful for finding the actual bugs).
> Implementation-wise the latter is trivial, the former requires some
> thought on how to expose that information to fuzzers and how to use
> (as Dmitry suggested it's not trivial). I'd also imagine we can have
> module-level variants ("GLOBAL_ASSERT*(...)") that monitor some global
> state, and also add support for some subset of temporal properties
> like "GLOBAL_ASSERT_EVENTUALLY(precond, eventually_holds)" as
> suggested below.
>
> I guess maybe I'd have to take a step back and just ask why we have no
> way to write plain and simple assertions that are removed in non-debug
> builds? Some subsystems seem to roll their own, which a 'git grep
> "#define ASSERT"' tells me.
>
> Is there a fundamental reason why we shouldn't have them, perhaps
> there was some past discussion? Today we have things like
> lockdep_assert_held(), but nothing to even write a simple assert
> otherwise. If I had to guess why something like ASSERT is bad, it is
> because it gives people a way to check for unexpected conditions, but
> if those checks disappear in non-debug builds, the kernel might be
> unstable. Therefore every possible state must be handled and we must
> always be able to recover. The argument in favor is, if the ASSERT()s
> are proven invariants or conditions where we'd recover either way, and
> are only there to catch accidental regressions during testing; and in
> non-debug builds we don't suffer the performance overheads.
..
> > > > > My ideal abstractions I've been thinking of isn't just for coverage,
> > > > > but to also capture temporal properties (which should be inspired by
> > > > > something like LTL or such), on top of which you can also build
> > > > > coverage. Then we can specify things like "if I observe some state X,
> > > > > then eventually we observe state Y", and such logic can also just be
> > > > > used to define functional coverage of interest (again all this
> > > > > inspired by what's already done in hardware verification).
> > > >
> > > > Promela/spin provides an LTL interface, but of course cannot handle
> > > > much of RCU, let alone of the entire kernel.  And LTL can be quite
> > > > useful.  But in a runtime system, how do you decide when "eventually"
> > > > has arrived?  The lockdep system does so by tracking entry to idle
> > > > and to userspace execution, along with exit from interrupt handlers.
> > > > Or did you have something else in mind?
> > >
> > > For coverage, one could simply await the transition to the "eventually
> > > state" indefinitely; once reached we have coverage.
> > >
> > > But for verification, because unlike explicit state model checkers
> > > like Spin, we don't have the complete state and can't build an
> > > exhaustive state-graph, we'd have to approximate. And without knowing
> > > exactly what it is we're waiting for, the simplest option would be to
> > > just rely on a timeout, either part of the property or implicit. What
> > > the units of that timeout are I'm not sure, because a system might
> > > e.g. be put to sleep.