linux-kernel - RE: RFC - kernel selftest result documentation (KTAP)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CY4PR13MB11756803217A9D20456E826FFD9C0@CY4PR13MB1175.namprd13.prod.outlook.com>
Date:   Mon, 15 Jun 2020 17:34:37 +0000
From:   "Bird, Tim" <Tim.Bird@...y.com>
To:     David Gow <davidgow@...gle.com>
CC:     "shuah@...nel.org" <shuah@...nel.org>,
        "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
        Brendan Higgins <brendanhiggins@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Kees Cook <keescook@...omium.org>
Subject: RE: RFC - kernel selftest result documentation (KTAP)

> -----Original Message-----
> From: David Gow <davidgow@...gle.com>
> 
> On Thu, Jun 11, 2020 at 2:11 AM Bird, Tim <Tim.Bird@...y.com> wrote:
> >
> > Some months ago I started work on a document to formalize how
> > kselftest implements the TAP specification.  However, I didn't finish
> > that work.  Maybe it's time to do so now.
> >
> > kselftest has developed a few differences from the original
> > TAP specification, and  some extensions that I believe are worth
> > documenting.
> >
> > Essentially, we have created our own KTAP (kernel TAP)
> > format.  I think it is worth documenting our conventions, in order to
> > keep everyone on the same page.
> >
> > Below is a partially completed document on my understanding
> > of KTAP, based on examination of some of the kselftest test
> > output.  I have not reconciled this with the kunit output format,
> > which I believe has some differences (which maybe we should
> > resolve before we get too far into this).
> 
> Thanks for doing this! This is something we've wanted to have for a while!
> 
> On the KUnit side of things, we've not (intentionally) deviated too
> much from TAP/kselftest
> It's certainly our intention to hew as close as possible to what
> kselftest is doing: I don't think there are any real conflicts
> conceptually (at least at the moment), but we're almost certainly
> handling a few details differently.
> 
> One other thing worth noting is that KUnit has a parser for our TAP
> results: './tools/testing/kunit/kunit.py parse' will do some basic
> parsing and print out results, a summary, etc.
> 
> A few other comments below:
> 
> > I submit the document now, before it is finished, because a patch
> > was recently introduced to alter one of the result conventions
> > (from SKIP='not ok' to SKIP='ok').
> >
> > See the document include inline below
> >
> > ====== start of ktap-doc-rfc.txt ======
> > Selftest preferred output format
> > --------------------------------
> >
> > The linux kernel selftest system uses TAP (Test Anything Protocol)
> > output to make testing results consumable by automated systems.  A
> > number of Continuous Integration (CI) systems test the kernel every
> > day.  It is useful for these systems that output from selftest
> > programs be consistent and machine-parsable.
> >
> > At the same time, it is useful for test results to be human-readable
> > as well.
> >
> > The kernel test result format is based on a variation TAP
> > TAP is a simple text-based format that is
> > documented on the TAP home page (http://testanything.org/).  There
> > is an official TAP13 specification here:
> > http://testanything.org/tap-version-13-specification.html
> >
> > The kernel test result format consists of 5 major elements,
> > 4 of which are line-based:
> >  * the output version line
> >  * the plan line
> >  * one or more test result lines (also called test result lines)
> >  * a possible "Bail out!" line
> >
> > optional elements:
> >  * diagnostic data
> >
> > The 5th element is diagnostic information, which is used to describe
> > items running in the test, and possibly to explain test results.
> > A sample test result is show below:
> >
> > Some other lines may be placed the test harness, and are not emitted
> > by individual test programs:
> >  * one or more test identification lines
> >  * a possible results summary line
> >
> > Here is an example:
> >
> >         TAP version 13
> >         1..1
> >         # selftests: cpufreq: main.sh
> >         # pid 8101's current affinity mask: fff
> >         # pid 8101's new affinity mask: 1
> >         ok 1 selftests: cpufreq: main.sh
> >
> > The output version line is: "TAP version 13"
> >
> > The test plan is "1..1".
> >
> > Element details
> > ===============
> >
> > Output version line
> > -------------------
> > The output version line is always "TAP version 13".
> >
> > Although the kernel test result format has some additions
> > to the TAP13 format, the version line reported by kselftest tests
> > is (currently) always the exact string "TAP version 13"
> >
> > This is always the first line of test output.
> 
> KUnit is currently outputting "TAP version 14", as we were hoping some
> of our changes would get into the TAP14 spec. (Any comments, Brendan?)
> Maybe this should end up saying "KTAP version 1" or something?

I don't know if this will break any existing results parsers or not.
I hesitate to use "TAP version 14", as TAP appears to be a dormant
initiative at the moment, and there's no guarantee that the kernel's
changes will get adopted into an official spec.

If we are a strict super-set of TAP, then I suppose we could just
start using TAP version 14, and unilaterally declare that our changes
make a new spec.  But since we don't control the web site this feels
like a hostile takeover.

I'm most comfortable with calling our thing KTAP, and just
referencing TAP as inspiration.  I don't have a strong opinion on
KTAP vs TAP, but I do feel strongly that kselftest and kunit should use the
same version line (if we can get them to use the same conventions).

> 
> > Test plan line
> > --------------
> > The test plan indicates the number of individual test cases intended to
> > be executed by the test. It always starts with "1.." and is followed
> > by the number of tests cases.  In the example above, 1..1", indicates
> > that this test reports only 1 test case.
> >
> > The test plan line can be placed in two locations:
> >  * the second line of test output, when the number of test cases is known
> >    in advance
> >  * as the last line of test output, when the number of test cases is not
> >    known in advance.
> >
> > Most often, the number of test cases is known in advance, and the test plan
> > line appears as the second line of test output, immediately following
> > the output version line.  The number of test cases might not be known
> > in advance if the number of tests is calculated from runtime data.
> > In this case, the test plan line is emitted as the last line of test
> > output.
> 
> KUnit is currently including the test plan line only for subtests, as
> the current version doesn't actually know how many test suites will
> run in advance.
> This is something there's work underway to fix, though.
Sounds good.  You can just put the line at the bottom if it's
obnoxious to calculate ahead of time.

Does this mean that KUnit treats each sub-test as an individual test case
of the "super-test"?

In results summaries for a super-test, are all sub-test cases counted,
or just the list of sub-tests?


> 
> > Test result lines
> > -----------------
> > The test output consists of one or more test result lines that indicate
> > the actual results for the test.  These have the format:
> >
> >   <result> <number> <description> [<directive>] [<diagnostic data>]
> >
> > The ''result'' must appear at the start of a line (except for when a
> > test is nested, see below), and must consist of one of the following
> > two phrases:
> >   * ok
> >   * not ok
> >
> > If the test passed, then the result is reported as "ok".  If the test
> > failed, then the result is reported as "not ok".  These must be in
> > lower case, exactly as shown.
> >
> > The ''number'' in the test result line represents the number of the
> > test case being performed by the test program.  This is often used by
> > test harnesses as a unique identifier for each test case.  The test
> > number is a base-10 number, starting with 1.  It should increase by
> > one for each new test result line emitted.  If possible the number
> > for a test case should be kept the same over the lifetime of the test.
> >
> > The ''description'' is a short description of the test case.
> > This can be any string of words, but should avoid using colons (':')
> > except as part of a fully qualifed test case name (see below).
> >
> > Finally, it is possible to use a test directive to indicate another
> > possible outcome for a test: that it was skipped.  To report that
> > a test case was skipped, the result line should start with the
> > result "not ok", and the directive "# SKIP" should be placed after
> > the test description. (Note that this deviates from the TAP13
> > specification).
> >
> > A test may be skipped for a variety of reasons, ranging for
> > insufficient privileges to missing features or resources required
> > to execute that test case.
> >
> > It is usually helpful if a diagnostic message is emitted to explain
> > the reasons for the skip.  If the message is a single line and is
> > short, the diagnostic message may be placed after the '# SKIP'
> > directive on the same line as the test result.  However, if it is
> > not on the test result line, it should precede the test line (see
> > diagnostic data, next).
> 
> We're in the process of supporting test skipping in KUnit at the
> moment[1], and haven't totally formalised what the syntax here should
> be. The only output issues thus far have been on the "ok"/"not ok"
> point (my in-progress patch is using 'ok', the previous RFC could
> output either).

I'll comment on this in my reply to Kees' email.

> At the moment, the reason a test is skipped has to be
> on the same line as the result for the tools to pick it up (and the
> KUnit API always requests such a 'status comment', even if it ends up
> as the empty string).

OK - I think this is a good convention.

> 
> We'll probably follow whatever kselftest does here, though, but will
> be able to do more with skip reasons on the result line.
> 
> > Diagnostic data
> > ---------------
> > Diagnostic data is text that reports on test conditions or test
> > operations, or that explains test results.  In the kernel test
> > result format, diagnostic data is placed on lines that start with a
> > hash sign, followed by a space ('# ').
> >
> > One special format of diagnostic data is a test identification line,
> > that has the fully qualified name of a test case.  Such a test
> > identification line marks the start of test output for a test case.
> >
> > In the example above, there are three lines that start with '#'
> > which precede the test result line:
> >         # selftests: cpufreq: main.sh
> >         # pid 8101's current affinity mask: fff
> >         # pid 8101's new affinity mask: 1
> > These are used to indicate diagnostic data for the test case
> > 'selftests: cpufreq: main.sh'
> >
> > Material in comments between the identification line and the test
> > result line are diagnostic data that can help to interpret the
> > results of the test.
> >
> > The TAP specification indicates that automated test harnesses may
> > ignore any line that is not one of the mandatory prescribed lines
> > (that is, the output format version line, the plan line, a test
> > result line, or a "Bail out!" line.)
> >
> > Bail out!
> > ---------
> > If a line in the test output starts with 'Bail out!', it indicates
> > that the test was aborted for some reason.  It indicates that
> > the test is unable to proceed, and no additional tests will be
> > performed.
> >
> > This can be used at the very beginning of a test, or anywhere in the
> > middle of the test, to indicate that the test can not continue.
> >
> > --- from here on is not-yet-organized material
> >
> > Tip:
> >  - don't change the test plan based on skipped tests.
> >    - it is better to report that a test case was skipped, than to
> >      not report it
> >    - that is, don't adjust the number of test cases based on skipped
> >      tests
> >
> > Other things to mention:
> > TAP13 elements not used:
> >  - yaml for diagnostic messages
> >    - reason: try to keep things line-based, since output from other things
> >    may be interspersed with messages from the test itself
> We're not using this in KUnit, either.
> >  - TODO directive
> Ditto: the upcoming SKIP support leaves room for this to easily be
> added, though.
> >
> > KTAP Extensions beyond TAP13:
> >  - nesting
> >    - via indentation
> >      - indentation makes it easier for humans to read
> We're using this a lot in KUnit, as all tests are split into suites.
> The syntax is basically a full nested TAP document, indented with four
> spaces. (There are a couple of tests which output some non-indented
> lines to our log, though.)
> 
> I've included some example output at the end of this email of what
> we're doing currently.
> 
> >  - test identifier
> >     - multiple parts, separated by ':'
> 
> >  - summary lines
> >    - can be skipped by CI systems that do their own calculations
> 
> We're not outputting any summary lines for the tests as a whole, but
> the success of a test suite is determined from the success of nested
> tests.
> 
> > Other notes:
> >  - automatic assignment of result status based on exit code
> >
> > Tips:
> >  - do NOT describe the result in the test line
> >    - the test case description should be the same whether the test
> >      succeeds or fails
> >    - use diagnostic lines to describe or explain results, if this is
> >      desirable
> >  - test numbers are considered harmful
> >    - test harnesses should use the test description as the identifier
> >    - test numbers change when testcases are added or removed
> >      - which means that results can't be compared between different
> >        versions of the test
> >  - recommendations for diagnostic messages:
> >    - reason for failure
> >    - reason for skip
> >    - diagnostic data should always preceding the result line
> >      - problem: harness may emit result before test can do assessment
> >        to determine reason for result
> >      - this is what the kernel uses
> >
> > Differences between kernel test result format and TAP13:
> >  - in KTAP the "# SKIP" directive is placed after the description on
> >    the test result line
> 
> That's what we're planning to do with KUnit as well: clearly I didn't
> read the TAP13 spec as thoroughly as I'd intended, as I'd naively
> assumed that this was TAP13 spec compliant. Oops.
> I'm very much in favour of this change.

OK - thanks for the feedback.   It's my preference also.

> 
> >
> > ====== start of ktap-doc-rfc.txt ======
> > OK - that's the end of the RFC doc.
> >
> > Here are a few questions:
> >  - is this document desired or not?
> 
> This is definitely a good thing for us: thanks a lot!
> 
> >     - is it too long or too short?
> >  - if the document is desired, where should it be placed?
> >    I assume somewhere under Documentation, and put into
> >    .rst format. Suggestions for a name and location are welcome.
> >  - is this document accurate?
> >    I think KUNIT does a few things differently than this description.
> >    - is the intent to have kunit and kselftest have the same output format?
> >       if so, then these should be rationalized.
> 
> As above, we'd love to at least try to have kunit and kselftest using
> the same format.
> 
> 
> > Finally,
> >   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
> > See https://testanything.org/tap-version-13-specification.html
> 
> I have a very mild preference for 'ok': but it doesn't really matter
> much one way or the other. Our tooling throws the result away if it
> sees a SKIP.

Ok - thanks for the feedback.
 -- Tim

> 
> > Regards,
> >  -- Tim
> >
> >
> 
> Example KUnit output (including the in-progress "skip test" support):
> TAP version 14
>    # Subtest: kunit-try-catch-test
>    1..2
>    ok 1 - kunit_test_try_catch_successful_try_no_catch
>    ok 2 - kunit_test_try_catch_unsuccessful_try_does_catch
> ok 1 - kunit-try-catch-test
>     # Subtest: example
>    1..2
>    # example_simple_test: initializing
>    ok 1 - example_simple_test
>    # example_skip_test: initializing
>    ok 2 - example_skip_test # SKIP this test should be skipped
> ok 2 - example
> 
> 
> [1]: https://lore.kernel.org/linux-kselftest/20200513042956.109987-1-davidgow@google.com/T/#u

It's nice to have this example.  Thanks.
 -- Tim