linux-kernel - Re: Reporting bugs and bisection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.1.10.0804131546370.9318@asgard>
Date:	Sun, 13 Apr 2008 16:51:34 -0700 (PDT)
From:	david@...g.hm
To:	Stephen Clark <sclark46@...thlink.net>
cc:	Evgeniy Polyakov <johnpol@....mipt.ru>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Willy Tarreau <w@....eu>, Tilman Schmidt <tilman@...p.cc>,
	Valdis.Kletnieks@...edu, Mark Lord <lkml@....ca>,
	David Miller <davem@...emloft.net>, jesper.juhl@...il.com,
	yoshfuji@...ux-ipv6.org, jeff@...zik.org,
	linux-kernel <linux-kernel@...r.kernel.org>, git@...r.kernel.org,
	netdev@...r.kernel.org
Subject: Re: Reporting bugs and bisection

cross-posted to git for the suggestion at the bottom

On Sun, 13 Apr 2008, Stephen Clark wrote:

> Evgeniy Polyakov wrote:
>> On Sun, Apr 13, 2008 at 10:33:49PM +0200, Rafael J. Wysocki (rjw@...k.pl) 
>> wrote:
>>> Things like this are very disappointing and have a very negative impact on 
>>> bug
>>> reporters.  We should do our best to avoid them.
>> 
>> Shit happens. This is a matter of either bug report or those who were in
>> the copy list. There are different people and different situations, in
>> which they do not reply.
>> 
> Well less shit would happen if developers would take the time to at least 
> test their patches before they were submitted. It like we will just have the 
> poor user do our testing for us. What kind of testing do developers do. I 
> been a linux user and have followed the LKML for a number of years and have 
> yet to see
> any test plans for any submitted patches.

I've been reading LKML for 11 years now, I've tested kernels and reported 
a few bugs along the way.

the expectation is that the submitter should have tested the patches 
before submitting them (where hardware allows). but that "where hardware 
allows" is a big problem. so many issues are dependant on hardwre that 
it's not possible to test everything.

there are people who download, compile and test the tree nightly (with 
farms of machines to test different configs), but they can't catch 
everything.

expecting the patches to be tested to the point where there are no bugs is 
unreasonable.

bisecting is a very powerful tool, but I do think that sometimes 
developers lean on it a bit much. taking the attitude (as some have) that 
'if the reporter can't be bothered to do a bisection I can't be bothered 
to deal with the bug' is going way too far.

if a bug can be reproduced reliably on a test system then bisecting it may 
reveal the patch that introduced or unmasked the bug (assuming that there 
aren't other problems along the way), but if the bug takes a long time to 
show up after a boot, or only happens under production loads, bisecting it 
may not be possible. that doesn't mean that the bug isn't real, it just 
means that the user is going to have to stick with an old version until 
there is a solution or work-around.

even in the hard-to-test situations, the reporter is usually able to test 
a few fixes, but there's a big difference between going to management and 
saying "the kernel guru's think that this will help, can we test it this 
weekend" 2-3 times and doing a bisection that will take 10-15 cycles to 
find the problem.

it's very reasonable to ask the reporter if they can bisect the problem, 
but if they say that they can't, declaring that they are out of luck is 
not reasonable, it just means that it's going to take more thinking to 
find the problem instead of being able to let the mechanical bisect 
process narrow things down for you. it may mean that the developer will 
need to make a patch to instrament an old (working) kernel that has 
minimal impact on that kernel so that the reporter can run this to gather 
information about what the load is so that the developer can try to 
simulate it on a new (non-working) kernel

in theory everyone has a test environment that lets them simulate 
everything in their production envrionment. in practice this is only true 
at the very low end (where it's easy to do) and the very high end (where 
it's so critical that it's done no matter how much it costs). Everyone 
else has a test environment that can test most things, but not everything. 
As such when they run into a problem they may not be able to do lots of 
essentially random testing.

elsewhere in this thread someone said that the pre-git way was to do a 
manual bisect where the developer would send patches backing out specific 
changes to find the problem. one big difference between tat and bisecting 
the problem is that the manual process was focused on the changes in the 
area that is suspected of causing the problem, while the git bisect 
process goes after all changes. this makes it much more likely that the 
tester will run into unrelated problems along the way.

I wonder if it would be possible to make a variation of git bisect that 
only looked at a subset of the tree when picking bisect points (if you are 
looking for a e1000 bug, testing bisect points that haven't changed that 
driver won't help you for example). If this can be done it would speed up 
the reporters efforts, but will require more assistance from the 
developers (who would need to tell the reporters what subtrees to test) so 
it's a tradeoff of efficiancy vs simplicity.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/