linux-kernel - Re: stable? quality assurance?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201007121744.05844.Martin@lichtvoll.de>
Date:	Mon, 12 Jul 2010 17:43:56 +0200
From:	Martin Steigerwald <Martin@...htvoll.de>
To:	linux-kernel@...r.kernel.org
Cc:	Willy Tarreau <w@....eu>
Subject: Re: stable? quality assurance?

Am Sonntag 11 Juli 2010 schrieb Willy Tarreau:
> Hi Martin,

Hi Willy,
 
> On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote:
> > I hope that someone answers who actually can take some critique. From
> > the current replies I perceive a lack of that ability.
> 
> well, I'll try to do then :-)
> 
> There were some threads in the past about kernel releases quality,
> where Linus explained why it could not be completely black or white.
> 
> Among the things he explained, I remember that one of primary concern
> was the inability to slow down development. I mean, if he waits 2 more
> weeks for things to stabilize, then there will be two more weeks of
> crap^H^H^H^Hdevelopment merged in next merge window, so in fact this
> will just shift dates and not quality.

Would it make that much of a difference? Linus could still say no to 
obvious crap, couldn't he?

> There are also some regressions that get merged with every pre-release.
> Thus, assuming he would wait for one more pre-release to merge the
> fixes you spotted, 2 or 3 more would appear, so there's a point where
> it must be decided when to release.

Some sort of classifying bugs could help here I think. Something that 
helps Linus to decide whether it is worth to do another release candidate 
round or not.

Actually I think the USB soundcard not working after resume bug I 
mentioned (bug #15788) wouldn't warrant a new release candidate round, 
especially as it didn't have a patch yet and will likely just affect a 
minority of users. Still it would be fine if it was fixed in time. I do 
think that the Radeon KMS does not work after resume bug (#15969) does 
qualify since it causes loss of data handled by the current X session(s) - 
sure I normally save my stuff before hibernating, but... And it actually 
had a patch that has been tested! The desktop freeze bug I mentioned would 
slip, cause I didn't report it and except from a debian bug report I found 
it wasn't confirmed at all. An reported and confirmed desktop freeze would 
qualify IMHO.

Actually I read postings from Linus that he actually reads the regression 
list kindly provided by Rafael. 15788 was in there, but IMHO wouldn't 
qualify (see posting "2.6.34-rc5: Reported regressions from 2.6.33"). But 
15969 was not - well it was reported for rc7, so too late for the manual 
report by Rafael. So yes, I see how it can have slipped.

Maybe an approach would be to dynamically generate the list from all bug 
reports marked for 2.6.34 versions and have it posted to kernel mailing 
list after every rc. This way bug #15969 would at least have been in the 
list of known regressions.

Bugzilla severity and priority fields or something similar could be used to 
set the importance of a bug report and the regression list could be sorted 
by importance. One important criterion also would be whether someone could 
confirm it, reproduce it. Even when I reported those desktop freezes, 
unless someone confirmed them it might just happen for me. Well a "confirm" 
or vote button might be good, so that the amount of confirmations could be 
counted. 

It would need some triaging and classifying and I am willing to help with 
that.

> Right now it's released when he feels it "good enough". This can be
> very subjective, but I'd think that "good enough" basically means
> that the kernel will be able to live in its stable branch without
> major changes and without reverting features.

Okay, then thats two different definitions of stable. I mean stable enough 
for (adventurous) end users. And here its more of a development point of 
view.
 
> Also, you have to consider that there are several types of users.
> Some of them are developers who will run a latest -git kernel at
> some point. Some of them will be enthousiasts waiting for a feature,
> and who will run every -rc kernel once the feature is merged, to
> ensure it does not break before the release. There are also janitors
> and the curious ones who'll basically run a few of the last -rc as
> time permits to see if they can spot a few last-minute issues before
> the release. There are the brave ones who systematically download
> the dot-0 release once Linus announces it and will proudly run it
> to show their friends who it's better than the last one. There are
> those who need a bit of stability (eg: professional laptop or home
> server) and will prefer to wait for a few stable releases to ensure
> they won't waste their time on a big stupid issue that all other ones
> above will have immediately spotted for them. And there are the ones
> who run production servers who will either use distro kernels of
> long term stable kernels, with a more or less long qualification
> process between upgrades.

Yes, stable enough for whom? I see.

> It's just an ecosystem where you have to find your place. From your
> description, I think you're before the last ones above, you need
> something which works, eventhough it's not critical, so you could
> very well wait for 2-3 stable updates before upgrading (that does
> not prevent you from testing earlier on other systems if you want
> to test performance, new features, regressions, etc...).

ACK.

> It's not really advisable to call dot-0 releases "unstable" because
> it will only result in shifting the adoption point between the user
> classes above. We need to have enthousiasts who proudly say "hey
> look, dot-0 and it's already rock solid". We've all seen some of them
> and they're the ones who help reporting issues that get fixed in the
> next stable release.

I do think the claim should be honest. "stable" IMHO is not, at least from 
a user's point of view. "unstable" isn't either, cause a dot-0 kernel is 
not guarenteed to be unstable ;). So I agree with the major release kernel 
approach from Rafael.

> I think that the most reasonable thing to do is to assume your need
> for stability and always refrain from running on the latest release.
> 
> Speaking for myself, I tend to run rock solid kernels for my data (my
[...]
> You see, there's a kernel for everyone, and for every usage. You just
> have to make your choice. And when you don't know or don't want to
> guess, stick to the distro's kernel.

Yes. As told already I will rebalance my decision on which kernel to use. 
And I now better understand some of the problems. Thanks.

But beyond that, I do think its worth thinking about ways to improve the 
process of ensuring as much stability as sensibly possible. A dot-0 kernel 
won't be error-free - but I find just claiming the current process as "the 
best we can have" not actually satisfying. And I do think it can be 
improved upon. I do not do kernel development, but I am willing to help 
with collecting information about the current state of the kernel, help 
with bug triaging as good as I can and manage to take time. I do have some 
experience with quality management as I coordinated the betatest of some 
AmigaOS versions,  but then this has been in a closed group. Here its a 
different scale and I believe it needs somewhat different approaches.

I reply to other posts in that thread later in the next days.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

Download attachment "signature.asc " of type "application/pgp-signature" (199 bytes)