linux-kernel - RE: RFC - kernel selftest result documentation (KTAP)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CY4PR13MB1175A71AE3937C0786721ABFFD9D0@CY4PR13MB1175.namprd13.prod.outlook.com>
Date:   Tue, 16 Jun 2020 21:16:01 +0000
From:   "Bird, Tim" <Tim.Bird@...y.com>
To:     Brendan Higgins <brendanhiggins@...gle.com>
CC:     "shuah@...nel.org" <shuah@...nel.org>,
        "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Kees Cook <keescook@...omium.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        David Gow <davidgow@...gle.com>
Subject: RE: RFC - kernel selftest result documentation (KTAP)



> -----Original Message-----
> From: Brendan Higgins
> 
> On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
> > Some months ago I started work on a document to formalize how
> > kselftest implements the TAP specification.  However, I didn't finish
> > that work.  Maybe it's time to do so now.
> >
> > kselftest has developed a few differences from the original
> > TAP specification, and  some extensions that I believe are worth
> > documenting.
> >
> > Essentially, we have created our own KTAP (kernel TAP)
> > format.  I think it is worth documenting our conventions, in order to
> > keep everyone on the same page.
> >
> > Below is a partially completed document on my understanding
> > of KTAP, based on examination of some of the kselftest test
> > output.  I have not reconciled this with the kunit output format,
> > which I believe has some differences (which maybe we should
> > resolve before we get too far into this).
> >
> > I submit the document now, before it is finished, because a patch
> > was recently introduced to alter one of the result conventions
> > (from SKIP='not ok' to SKIP='ok').
> >
> > See the document include inline below
> >
> > ====== start of ktap-doc-rfc.txt ======
> 
> [...]
> 
> > --- from here on is not-yet-organized material
> >
> > Tip:
> >  - don't change the test plan based on skipped tests.
> >    - it is better to report that a test case was skipped, than to
> >      not report it
> >    - that is, don't adjust the number of test cases based on skipped
> >      tests
> >
> > Other things to mention:
> > TAP13 elements not used:
> >  - yaml for diagnostic messages
> 
> We talked about this before, but I would like some way to get failed
> expectation/assertion information in the test in a consistent machine
> parsible way. Currently we do the following:
> 
>     # Subtest: example
>     1..1
>     # example_simple_test: initializing
>     # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29
>     Expected 1 + 1 == 3, but
>         1 + 1 == 2
>         3 == 3
>     not ok 1 - example_simple_test
> not ok 5 - example
> 
> Technically not TAP compliant, but no one seems to mind. I am okay with
> keeping it the way it is, but if we don't want it in the KTAP spec, we
> will need some kind of recourse.

So far, most of the CI systems don't parse out diagnostic data, so it doesn't
really matter what the format is.  If it's useful for humans, it's valuable as is.
However, it would be nice if that could change.  But without some formalization
of the format of the diagnostic data, it's an intractable problem for CI systems
to parse it.  So it's really a chicken and egg problem.  To solve it, we would have
to determine what exactly needs to be provided on a consistent basis for diagnostic
data across many tests.  I think that it's too big a problem to handle right now.
I'm not opposed to migrating to some structure with yaml in the future, but free
form text output seems OK for now.

> 
> >    - reason: try to keep things line-based, since output from other things
> >    may be interspersed with messages from the test itself
> >  - TODO directive
> 
> Is this more of stating a fact or desire? We don't use TODO either, but
> it looks like it could be useful.
Just stating a fact.  I didn't find TODO in either KUnit or selftest in 
November when I initially wrote this up.  If TODO serves as a kind
of XFAIL, it could be useful.  I have nothing against it.

> 
> > KTAP Extensions beyond TAP13:
> >  - nesting
> >    - via indentation
> >      - indentation makes it easier for humans to read
> >  - test identifier
> >     - multiple parts, separated by ':'
> 
> Can you elabroate on this more? I am not sure what you mean.
An individual test case can have a name that is scoped by a containing
test or test suite.  For example: selftests: cpufreq: main.sh
This test identifier consists of the test system (selftests), the test
area (cpufreq), and the test case name (main.sh).  This one's a bit
weird because the test case name is just the name of the program
in that test area.  The program itself doesn't output data in TAP format,
and the harness uses it's exit code to detect PASS/FAIL.  if main.sh had
multiple test cases, it might produce test identifiers like this:
selftests: cpufreq: main: check_change_afinity_mask
selftests: cpufreq: main: check_permissions_for_mask_operation
(Or it might just produce the last part of these strings, the
testcase names, and the testcase id might be something generated
by the harness or CI system.)

The value of having a single string to identify the testcase (like a
uniform resource locator), is that it's easier to use the string to
correlate results produced from different CI system that are executing
the same test.

> 
> >  - summary lines
> >    - can be skipped by CI systems that do their own calculations
> >
> > Other notes:
> >  - automatic assignment of result status based on exit code
> >
> > Tips:
> >  - do NOT describe the result in the test line
> >    - the test case description should be the same whether the test
> >      succeeds or fails
> >    - use diagnostic lines to describe or explain results, if this is
> >      desirable
> >  - test numbers are considered harmful
> >    - test harnesses should use the test description as the identifier
> >    - test numbers change when testcases are added or removed
> >      - which means that results can't be compared between different
> >        versions of the test
> >  - recommendations for diagnostic messages:
> >    - reason for failure
> >    - reason for skip
> >    - diagnostic data should always preceding the result line
> >      - problem: harness may emit result before test can do assessment
> >        to determine reason for result
> >      - this is what the kernel uses
> >
> > Differences between kernel test result format and TAP13:
> >  - in KTAP the "# SKIP" directive is placed after the description on
> >    the test result line
> >
> > ====== start of ktap-doc-rfc.txt ======
> > OK - that's the end of the RFC doc.
> >
> > Here are a few questions:
> >  - is this document desired or not?
> >     - is it too long or too short?
> >  - if the document is desired, where should it be placed?
> 
> I like it. I don't think we can rely on the TAP people updating their
> stuff based on my interactions with them. So having a spec which is
> actually maintained would be nice.
> 
> Maybe in Documentation/dev-tools/ ?
I'm leaning towards Documentation/dev-tools/test-results_format.rst

> 
> >    I assume somewhere under Documentation, and put into
> >    .rst format. Suggestions for a name and location are welcome.
> >  - is this document accurate?
> >    I think KUNIT does a few things differently than this description.
> >    - is the intent to have kunit and kselftest have the same output format?
> >       if so, then these should be rationalized.
> 
> Yeah, I think it would be nice if all test frameworks/libraries for the
> kernel output tests in the same language.
Agreed.

 -- Tim