[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <279c6f69-9af2-4607-b5d1-1acd21f9da8b@sirena.org.uk>
Date: Wed, 17 Jan 2024 17:33:23 +0000
From: Mark Brown <broonie@...nel.org>
To: Kent Overstreet <kent.overstreet@...ux.dev>
Cc: Neal Gompa <neal@...pa.dev>, Kees Cook <keescook@...omium.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-bcachefs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-hardening@...r.kernel.org,
Nikolai Kondrashov <spbnick@...il.com>,
Philip Li <philip.li@...el.com>,
Luis Chamberlain <mcgrof@...nel.org>
Subject: Re: [GIT PULL] bcachefs updates for 6.8
On Mon, Jan 15, 2024 at 01:42:53PM -0500, Kent Overstreet wrote:
> On Fri, Jan 12, 2024 at 06:22:55PM +0000, Mark Brown wrote:
> > This depends a lot on the area of the kernel you're looking at - some
> > things are very amenable to testing in a VM but there's plenty of code
> > where you really do want to ensure that at some point you're running
> > with some actual hardware, ideally as wide a range of it with diverse
> > implementation decisions as you can manage. OTOH some things can only
> > be tested virtually because the hardware doesn't exist yet!
> Surface wise, there are a lot of drivers that need real hardware; but if
> you look at where the complexity is, the hard complex algorithmic stuff
> that really needs to be tested thoroughly - that's all essentially
> library code that doesn't need specific drivers to test.
...
> And if we were better at that, it would be a good nudge towards driver
> developers to make their stuff easier to test, perhaps by getting a
> virtualized implementation into qemu, or to make the individual drivers
> thinner and move heavy logic into easier to test library code.
As Greg indicated with the testing I doubt everyone has infinite budget
for developing emulation, and I will note that model accuracy and
performance tend to be competing goals. When it comes to factoring
things out into library code that can be a double edged sword - changes
in the shared code can affect rather more systems than a single driver
change so really ought to be tested on a wide range of systems. The
level of risk from changes does vary widly of course, and you can try to
have pure software tests for the things you know are relied upon, but
it can be surprising.
> > Yeah, similar with a lot of the more hardware focused or embedded stuff
> > - running something on the machine that's in front of you is seldom the
> > bit that causes substantial issues. Most of the exceptions I've
> > personally dealt with involved testing hardware (from simple stuff like
> > wiring the audio inputs and outputs together to verify that they're
> > working to attaching fancy test equipment to simulate things or validate
> > that desired physical parameters are being achieved).
> Is that sort of thing a frequent source of regressions?
> That sounds like the sort of thing that should be a simple table, and
> not something I would expect to need heavy regression testing - but, my
> experience with driver development was nearly 15 years ago; not a lot of
> day to day. How badly are typical kernel refactorings needing regression
> testing in individual drivers?
General refactorings tend not to be that risky, but once you start doing
active work on the shared code dealing with the specific thing the risk
starts to go up and some changes are more risky than others.
> Filesystem development, OTOH, needs _heavy_ regression testing for
> everything we do. Similarly with mm, scheduler; many subtle interactions
> going on.
Right, and a lot of factored out code ends up in the same boat - that's
kind of the issue.
> > > > It's a basic lack of leadership. Yes, the younger engineers are always
> > > > going to be doing the new and shiny, and always going to want to build
> > > > something new instead of finishing off the tests or integrating with
> > > > something existing. Which is why we're supposed to have managers saying
> > > > "ok, what do I need to prioritize for my team be able to develop
> > > > effectively".
> > That sounds more like a "(reproducible) tests don't exist" complaint
> > which is a different thing again to people going off and NIHing fancy
> > frameworks.
> No, it's a leadership/mentorship thing.
> And this is something that's always been lacking in kernel culture.
> Witness the kind of general grousing that goes on at maintainer summits;
> maintainers complain about being overworked and people not stepping up
> to help with the grungy responsibilities, while simultaneously we still
> very much have a "fuck off if you haven't proven yourself" attitude
> towards newcomers. Understandable given the historical realities (this
> shit is hard and the penalties of fucking up are high, so there does
> need to be a barrier to entry), but it's left us with some real gaps.
> We don't have enough a people in the senier engineer role who lay out
> designs and organise people to take on projects that are bigger than one
> single person can do, or that are necessary but not "fun".
> Tests and test infrastructure fall into the necessary but not fun
> category, so they languish.
Like Greg said I don't think that's a realistic view of how we can get
things done here - often the thing with stop energy is that it just
makes people stop. In a lot of areas everyone is just really busy and
struggling to keep up, we make progress on the generic stuff in part by
accepting that people have limited time and will do what they can with
everyone building on top of everyone's work.
> > > > Just requisition the damn machines.
> > There's some assumptions there which are true for a lot of people
> > working on the kernel but not all of them...
> $500 a month for my setup (and this is coming out of my patreon funding
> right now!). It's a matter of priorities, and being willing to present
> this as _necessary_ to the people who control the purse strings.
One of the assumptions there is that everyone is doing this in a well
funded corporate environment focused on upstream. Even ignoring
hobbyists and students for example in the embedded world it's fairly
common to have stuff being upstreamed since people did the work anyway
for a customer project or internal product but where the customer
doesn't actually care either way if the code lands anywhere other than
their product (we might suggest that they should care but that doesn't
mean that they actually do care).
I'll also note that there's people like me who do things with areas of
the kernel not urgently related to their current employer's business and
hence very difficult to justify as a work expense. With my lab some
companies have been generous enough to send me test hardware (which I'm
very greatful for, that's most of the irreplaceable stuff I have) but
the infrastructure around them and the day to day operating costs are
all being paid for by me personally.
> > > > I'd also really like to get automated performance testing going too,
> > > > which would have similar requirements in that jobs would need to be
> > > > scheduled on specific dedicated machines. I think what you're doing
> > > > could still build off of some common infrastructure.
> > It does actually - like quite a few test labs mine is based around LAVA,
> > labgrid is the other popular option (people were actually thinking about
> > integrating the two recently since labgrid is a bit lower level than
...
> > want to run and what results I expect. What I've got is *much* more
> > limited than I'd like, and frankly if I wasn't able to pick up huge
> > amounts of preexisting work most of this stuff would not be happening.
> That's interesting. Do you have or would you be willing to write an
> overview of what you've got? The way you describe it I wonder if we've
> got some commonality.
I was actually thinking about putting together a talk about it, though
realistically the majority of it is just a very standard LAVA lab which
is something there's a bunch of presentations/documentation about
already.
> The short overview of my system: tests are programs that expose
> subcommends for listing depencies (i.e. virtual machine options, kernel
> config options) and for listing and running subtests. Tests themselves
> are shell scripts, with various library code for e.g. standard
> kernel/vm config options, hooking up tracing, core dump catching, etc.
> The idea is for tests to be entirely self contained and need no outside
> configuration.
The tests themselves bit sounds like what everyone else is doing - it
all comes down to running some shell commands in a target environment
somewhere. kselftest provides information on which config options it
needs which would be nice to integrate too.
> and the CI, on top of all that, watches various git repositories and -
> as you saw - tests every commit, newest to oldest, and provides the
> results in a git log format.
> The last one, "results in git log format", is _huge_. I don't know why I
> haven't seen anyone else do that - it was a must-have feature for any
> system over 10 years ago, and it never appeared so I finally built it
> myself.
A lot of the automated testing that gets done is too expensive to be
done per commit, though some does. I do actually do it myself, but even
there it's mainly just some very quick smoke tests that get run per
commit with more tests done on the branch as a whole (with a bit more
where I can parallise things well). My stuff is more organised for
scripting so expected passes are all just elided, I just use LAVA's UI
if I want to pull the actual jobs for some reason. I've also see aiaiai
used for this, though I think the model there was similarly to only get
told about problems.
> We (inherently!) have lots of issues with tests that only sometimes fail
> making it hard to know when a regression was introduced, but running all
> the tests on every commit with a good way to see the results makes this
> nearly a non issue - that is, with a weak and noisy signal (tests
> results) we just have to gather enough data and present the results
> properly to make the signal stand out (which commit(s) were buggy).
Yeah, running for longer and/or more often helps find the hard to
reproduce things. There's a bunch of strategies for picking exactly
what to do there, per commit is certainly a valid one.
Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)
Powered by blists - more mailing lists