lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mnjcsiqjqdnvbbkaaz5r4n42e56qsax667r7radzyagnmmfkip@dfi64z5deqzj>
Date:   Mon, 11 Sep 2023 11:34:13 +0200
From:   Maxime Ripard <mripard@...nel.org>
To:     Daniel Stone <daniels@...labora.com>
Cc:     Daniel Vetter <daniel@...ll.ch>,
        Helen Koike <helen.koike@...labora.com>, emma@...olt.net,
        linux-doc@...r.kernel.org, vignesh.raman@...labora.com,
        dri-devel@...ts.freedesktop.org, alyssa@...enzweig.io,
        jbrunet@...libre.com, robdclark@...gle.com, corbet@....net,
        khilman@...libre.com, sergi.blanch.torne@...labora.com,
        david.heidelberg@...labora.com, linux-rockchip@...ts.infradead.org,
        martin.blumenstingl@...glemail.com, robclark@...edesktop.org,
        anholt@...gle.com, linux-mediatek@...ts.infradead.org,
        matthias.bgg@...il.com, linux-amlogic@...ts.infradead.org,
        gustavo.padovan@...labora.com,
        linux-arm-kernel@...ts.infradead.org,
        angelogioacchino.delregno@...labora.com, neil.armstrong@...aro.org,
        guilherme.gallo@...labora.com, linux-kernel@...r.kernel.org,
        tzimmermann@...e.de
Subject: Re: [PATCH v11] drm: Add initial ci/ subdirectory

Hi

(Removing most of the context that got scrambled)

On Thu, Sep 07, 2023 at 01:40:02PM +0200, Daniel Stone wrote:
> Yeah, this is what our experience with Mesa (in particular) has taught us.
> 
> Having 100% of the tests pass 100% of the time on 100% of the platforms is a
> great goal that everyone should aim for. But it will also never happen.
> 
> Firstly, we're just not there yet today. Every single GPU-side DRM driver
> has userspace-triggerable faults which cause occasional errors in GL/Vulkan
> tests. Every single one. We deal with these in Mesa by retrying; if we
> didn't retry, across the breadth of hardware we test, I'd expect 99% of
> should-succeed merges to fail because of these intermittent bugs in the DRM
> drivers.

So the plan is only to ever test rendering devices? It should have been
made clearer then.

> We don't have the same figure for KMS - because we don't test it - but
> I'd be willing to bet no driver is 100% if you run tests often enough.

And I would still consider that a bug that we ought to fix, and
certainly not something we should sweep under the rug. If half the tests
are not running on a driver, then fine, they aren't. I'm not really
against having failing tests, I'm against not flagging unreliable tests
on a given hardware as failing tests.

> Secondly, we will never be there. If we could pause for five years and sit
> down making all the current usecases for all the current hardware on the
> current kernel run perfectly, we'd probably get there. But we can't: there's
> new hardware, new userspace, and hundreds of new kernel trees.

Not with that attitude :)

I'm not sure it's actually an argument, really. 10 years ago, we would
never have been at "every GPU on the market has an open-source driver"
here. 5 years ago, we would never have been at this-series-here. That
didn't stop anyone making progress, everyone involved in that thread
included.

> Even without the first two, what happens when the Arm SMMU maintainers
> (choosing a random target to pick on, sorry Robin) introduce subtle
> breakage which makes a lot of tests fail some of the time? Do we
> refuse to backmerge Linus into DRM until it's fixed, or do we disable
> all testing on Arm until it's fixed? When we've done that, what
> happens when we re-enable testing, and discover that a bunch of tests
> get broken because we haven't been testing?

I guess that's another thing that needs to be made clearer then. Do you
want to test Mesa, or the kernel?

For Mesa, I'd very much expect to rely on a stable kernel, and for the
kernel on a stable Mesa.

And if we're testing the kernel, then let's turn it the other way
around. How are we even supposed to detect those failures in the first
place if tests are flagged as unreliable?

No matter what we do here, what you describe will always happen. Like,
if we do flag those tests as unreliable, what exactly prevents another
issue to come on top undetected, and what will happen when we re-enable
testing?

On top of that, you kind of hinted at that yourself, but what set of
tests will pass is a property linked to a single commit. Having that
list within the kernel already alters that: you'll need to merge a new
branch, add a bunch of fixes and then change the test list state. You
won't have the same tree you originally tested (and defined the test
state list for).

It might or might not be an issue for Linus' release, but I can
definitely see the trouble already for stable releases where fixes will
be backported, but the test state list certainly won't be updated.

> Thirdly, hardware is capricious. 'This board doesn't make it to u-boot' is a
> clear infrastructure error, but if you test at sufficient scale, cold solder
> or failing caps surface way more often than you might think. And you can't
> really pick those out by any other means than running at scale, dealing with
> non-binary results, and looking at the trends over time. (Again this is
> something we do in Mesa - we graph test failures per DUT, look for outliers,
> and pull DUTs out of the rotation when they're clearly defective. But that
> only works if you actually run enough tests on them in the first place to
> discover trends - if you stop at the first failed test, it's impossible to
> tell the difference between 'infuriatingly infrequent kernel/test bug?' and
> 'cracked main board maybe?'.)
> 
> What we do know is that we _can_ classify tests four ways in expectations.
> Always-passing tests should always pass. Always-failing tests should always
> fail (and update the expectations if you make them pass). Flaking tests work
> often enough that they'll always pass if you run them a couple/few times,
> but fail often enough that you can't rely on them. Then you just skip tests
> which exhibit catastrophic failure i.e. local DoS which affects the whole
> test suite.
> 
> By keeping those sets of expectations, we've been able to keep Mesa pretty
> clear of regressions, whilst having a very clear set of things that should
> be fixed to point to. It would be great if those set of things were zero,
> but it just isn't. Having that is far better than the two alternatives:
> either not testing at all (obviously bad), or having the test always be red
> so it's always ignored (might as well just not test).

Isn't that what happens with flaky tests anyway? Even more so since we
have 0 context when updating that list.

I've asked a couple of times, I'll ask again. In that other series, on
the MT8173, kms_hdmi_inject@...ect-4k is setup as flaky (which is a KMS
test btw).

I'm a maintainer for that part of the kernel, I'd like to look into it,
because it's seriously something that shouldn't fail, ever, the hardware
isn't involved.

How can I figure out now (or worse, let's say in a year) how to
reproduce it? What kernel version was affected? With what board? After
how many occurences?

Basically, how can I see that the bug is indeed there (or got fixed
since), and how to start fixing it?

And then repeat for any other test listed in there.

I got no other reply before because I very well know the answer: nobody
knows. And that's a serious issue to me, because that effectively means
that the flaky test list will only ever increase (since we can't even
check that it's fixed, and the CI infrastructure won't check that it got
fixed either), and we won't be able to address any of the bugs listed
there.

Maxime

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ