[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0912171256560.15740@localhost.localdomain>
Date: Thu, 17 Dec 2009 13:14:43 -0800 (PST)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Alain Knaff <alain@...ff.lu>
cc: markh@...pro.net, fdutils@...tils.linux.lu,
linux-kernel@...r.kernel.org
Subject: Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils]
Cannot format floppies under kernel 2.6.*?)
On Thu, 17 Dec 2009, Alain Knaff wrote:
>
> For the moment, I have a very small sample of hardware:
> 1. One machine which works (my own): Athlon XP 1800+ processor
> 2. One which doesn't work (Mark's)
Ok. I don't think I even have any machines with floppy drives any more
(one external USB drive somewhere gathering dust just in case I ever
encounter a floppy again).
> I might get access to a wider sample of boxen in a week or so, in order
> to do some stats.
Ok, I was more thinking "we have a bugzilla with ten different people
reporting this". If it's just a single machine, that's not going to be
relevant.
> What's the easiest way to find out the chipset?
>
> Here's already the output of lspci from my machine (works):
>
> 00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge
> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge
> 00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
Yeah, lspci (and generally only the northbridge and southbridge matters,
the "ISA bridge" might technically be relevant, but since it's universally
on the same die as the southbridge, I left it in there just for kicks).
> (It happens during formatting the floppy drive: here the first byte
> happens to be the trackid of the first physical sector of the track, and
> it always ends up being the track of the *previously* formatted track).
I guess it could simply be a floppy controller bug too, triggered by some
random timing difference or innocuous-looking change.
> > But I think we'd like to see a list of hardware where this can be
> > triggered,
>
> We'll get a list of 2 machines relatively quickly (unless other people
> would like to chime in: the test is easy, just fdformat a floppy disk),
> and more in a week or so.
Only the "it doesn't work on xyz" is likely interesting. The machines it
works on are probably uninteresting statistically.
> > and quite frankly, a 'git bisect' would be absolutely wonderful
>
> How exactly would I use this (command line sample)?
You'd need a git tree that contains both the working and non-working
versions, and then literally just do
git bisect start
git bisect good <known good version number here>
git bisect bad <known bad version here>
and it will give you a commit to try. Compile, test, see if it's good or
bad, and do
git bisect [good|bad]
depending on the result. Rinse and repeat (depending on how tight the
initial good/bad commits were, it will need 10-15 kernel tests).
So in this case, since apparently 2.6.27.41 is good, and 2.6.28 is not, it
would be something like this:
# clone hpa's tree that has all the stable releases in one place
git clone git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-allstable.git
cd linux-2.6-allstable
git bisect start
git bisect bad v2.6.28
git bisect good v2.6.27.41
and off you go.
NOTE! Bisection depends very much on the bug being 100% reproducible. If
you ever mark a good kernel bad (because you messed up) or a bad kernel
good (because the bug wasn't 100% reproducible, so you _thought_ it was
good even though the bug was present and just happened to hide), the end
result of the bisect will be totally unreliable and seriously screwed up.
So after a successful bisect, it is usually a good idea to try to go back
to the original known-bad kernel, and then revert the commit that was
indicated as the bad one (assuming the revert works - it could be that the
bad one ends up being fundamental to other commits after it), and test
that yes, that really fixes the bug.
It gets more complicated if the bisect hits kernels that you can't test
because they have _unrelated_ issues on that machine (compile failures or
just other bugs that hide the actual floppy behavior), but generally
bisection is pretty simple. "man git-bisect" does have some extra
pointers.
So git bisect may be somewhat time-consuming and mindless, but for
reliably triggering bugs where nobody really knows what caused the bug it
is a _really_ convenient thing to do. The only thing you need is a
reliably triggering test-case, and some time.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists