linux-kernel - A quick "Regression tracking: state of the union early 2024" from my side

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <7613e402-894a-4d38-8cef-7263630c1c57@leemhuis.info>
Date: Wed, 10 Jan 2024 08:22:39 +0100
From: Thorsten Leemhuis <linux@...mhuis.info>
To: Linux kernel regressions list <regressions@...ts.linux.dev>
Cc: Greg KH <gregkh@...uxfoundation.org>,
 Linus Torvalds <torvalds@...ux-foundation.org>,
 LKML <linux-kernel@...r.kernel.org>
Subject: A quick "Regression tracking: state of the union early 2024" from my
 side

[I initially published below text on
https://linux-regtracking.leemhuis.info/post/status-jan2024/ ; reposting
it here for completeness and in case anyone wants to comment on it]

## The long story short

I'm not really happy with my performance wrt to my regression tracking
efforts during the last year. To counter that, I've already shifted my
focus somewhat ~in October. With the new year I will shift it some more.
Top-priority will be "make regzbot more useful for kernel subsystem
maintainers" from now on. My tracking efforts of course will continue,
but everything except regressions in the current and the previous
mainline cycle might not see much attention from my side. This
refocusing also means that I won't work much on resolving some
ambiguities around "how regressions are supposed to be handled" which
lead to tension quite a few times. But all that should be for the best
in the long term.

## The details

### Looking back at 2023

My regression tracking efforts with regzbot still have a circus factor
of "one": if I'd run off and join a circus tomorrow, it's likely that
nobody would continue my work. That needs to change to make regression
tracking successful in the long term. I'm very well aware of that,
nevertheless when I look back at last year I think some of my efforts on
regression tracking worked against my goal to establish regression
tracking properly within the Linux kernel development process.

That's why I'm not really happy with my performance last year. That does
not mean that I'm totally unhappy with it, as my work made a difference.
But in the end I might have set the wrong priorities sometimes. Most
importantly in these cases:

* I got into too many debates with developers when I thought a
particular regression was not handled appropriately. Sometimes I was
right, occasionally I was wrong (or even stupid, as I'm just human :-/ )
– and most of the time it was in the big gray area in between, where
your point of view and your understanding of how Linus wants regressions
to be handled decides which of the two it is.

* I should have intervened way earlier in public when volunteers tried
to help with regression tracking, but did so in ways that annoyed
developers (which was totally understandable).

* I should have spent more time improving regzbot. But I did not due to
lack of time and some tasks that are energy drainers. Some of them:

 - Skimming lists and bugzilla for regression reports to track takes a
whole lot of time.

 - I spend a lot of time following up on tracked regression reports,
because they from my understanding of what Linus wants were not handled
well.

 - I had to spend a lot of time following up on regression reports I or
somebody else added to the tracking, as developers often forget Link: or
Closes: tags pointing to the report.

* When I found time to work on regzbot I'm not sure if I worked on the
right features. That's because I spend most of that time on code to
support tracking regressions submitted on gitlab instances or github, as
without it regzbot is unable to track regressions reported for the DRM
and SOF subsystems or things ClangBuiltLinux finds. This work will also
improve the rough bugzilla support, which is crucial if bugzilla gets
used in the way Konstantin envisions it. These changes furthermore
renovate a few really ugly parts in the regzbot code (written in its
early days when I was getting into programming), which is wise to do
before implementing some other important features.

  All in all it was a lot of work, especially dealing with the APIs for
the three bug trackers. Maybe 90 percent of that work is done now; it's
committed, but not used in production, as it still needs a lot more
testing and finetuning.

There are a few other things I'm unhappy with, but those were the major
ones.

### Plans for 2024

In general: regzbot becomes the priority; I'll try to stay on top of
tracked regressions and look out for reports that need to be tracked,
but for some time will work less strictly to reduce the timeI  spend on
this.

These are the regzbot features I plan to work on:

* Finish the current work (gitlab/github support with related core
improvements; see above), which will take *at least* the rest of January
I fear.

* Afterwards focus for a while on making regzbot a more useful and
easier to use tool for kernel developers and subsystems maintainers.This
partly relies on some of the internal renovations already in the works
(see above) and will consist of many small changes in various areas.
Some of them:

 - Make it dead easy to add regressions to the tracking reported only
indirectly by way of a patch submission that fixes the problem.

 - Implement a dedicated "#regzbot forwarding" command, as people often
fail to use the current syntax correctly (they forget the caret in
"#regzbot ^introduced", put it in the wrong place, or do not reply to
what's considered the report).

- Related to the "do not reply to what's considered the report" in the
previous point: implement a command like "#regzbot adjustreport
https://example.com/foo" to adjust the location of the report.

 - Support bulk adding reports and updating the status of tracked
regressions out-of-thread. This will reduce the amount of mail
developers receive and make updating tracking bits easier for me as well.

 - Make it easier to handle duplicates.

 - Webpages and reports UI: create pages where subsystem maintainers can
see unresolved regression in their area.

* Ideally find a subsystem where the maintainers want to use regzbot and
work closely with me to make regzbot more useful for them.

* Allow tagging, for example to tag regressions reports coming from a
certain CI, so that the CI projects can rely on regzbot's magic to keep
an eye on regressions they reported.

* Handle fixes not yet mainlined better in the webui and the reports;
e.g. separate "Fix incoming" into something like "fix up for review",
"fix pending (this cycle)", "fix pending (next cycle)".

* There are a few other things planned for later, but I might work on
them earlier if it turns out they make subsystem maintainers happier:

  - Separate actionable vs non-actionable reports in the UI (actionable:
a sane report with a bisection result).

 - Mark some regressions as "priority".

 - Export data in a simple format to enable developers to allow
scripting things like "is anything in here known to cause a regression
not yet fixed".

 - Make regzbot send mails or add comments. But only when regzbot works
well; and ensure those mails won't bother people.

Regression tracking:

* Spend less time looking out for regression reports and following up
the regressions that regzbot tracks.

  To do so, I plan to focus on regressions introduced during the current
or the previous mainline release. I'll try to keep an eye on regressions
in mainline releases from the past 12 months as well as those in
stable/longterm trees, but will try to not spend too much time on that.
I'll ignore everything older and regressions not bisected, unless it's
one where I get "ohh, this is not good at all" vibes; in such cases I
likely will continue to help reporters improve their report, but in
other cases I won't do that anymore.

Side projects:

* Submit a text on bisecting a Linux kernel regression for inclusion
into the kernel's documentation. I started writing that text on
Christmas eve while having a slight headache; got into a flow afterwards
and finished the bulk of it early January. Just needs more polishing, so
it would be a shame to let it linger on my hard disk.

* Try to add some text to the kernel's documentation endorsed by Linus
and briefly describing how he wants regressions to be handled. Basically
a shorter version of the "Expectations and best practices for fixing
regressions" section already in
https://docs.kernel.org/process/handling-regressions.html. What I've
written there is based on actions and past e-mails from Linus combined
with putting things in context (in general and with stable in mind). But
people don't take it for full, as it was only ACKed by Greg, but not
from Linus – which leads to discussions that are annoying for everyone
involved (and created a lot of tension between developers and myself).

* Prepare discussions about handling and tracking regressions for both
the kernel summit and the maintainers summit this fall.

### Closing words

There are a ton of other things I could and maybe should write here,
that's why I suspect I've forgotten an important thing or two. If that
turns out to be true I might update this post within the first days of
its publication.