[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <21d7e9970809240159u6db747eex51892061846b2251@mail.gmail.com>
Date: Wed, 24 Sep 2008 18:59:34 +1000
From: "Dave Airlie" <airlied@...il.com>
To: "David Miller" <davem@...emloft.net>
Cc: jkosina@...e.cz, jeffrey.t.kirsher@...el.com, david.vrabel@....com,
rjw@...k.pl, linux-kernel@...r.kernel.org,
kernel-testers@...r.kernel.org, chrisl@...are.com
Subject: Re: [Bug #11382] e1000e: 2.6.27-rc1 corrupts EEPROM/NVM
On Wed, Sep 24, 2008 at 5:36 PM, David Miller <davem@...emloft.net> wrote:
> From: "Dave Airlie" <airlied@...il.com>
> Date: Wed, 24 Sep 2008 15:45:46 +1000
>
>> I'm still dubious about this, wouldn't we see other wierdass side
>> effects if X was trashing the BARs on other devices?
>
> Sure. My theory is that it's a recent xorg change causing this,
> so I've been going through GIT history for xserver, libpciaccess,
> and the intel driver for the past year looking for clues.
>
> If there is usually a gap after the video device, there would just
> be no response from the PCI bus, and the way that's handled is
> chipset specific. At least a while back, most x86 systems would
> silently ignore writes and return all 1's in such a case, but
> they may be generating bus error events these days. I simply don't
> know.
The only thing I can think off then is either the pciaccess conversion
of the intel Xorg driver,
or maybe something going wrong since PAT support was added.
>
>> I think tglx is on the right path, same problem as e1000, code is
>> stupid, it can reenter the nvram read/write code from irq
>> context, and pwn itself.
>
> The e1000e side here is reproducable way too easily for it to be the
> same case, as far as I see it.
>
> The e1000 driver has probably had this problem for years and we've
> only recently had some concrete cases of it triggering.
>
> Also, what utility are you running on your system that is even
> accessing the NVRAM on the e1000e card? Knowing that might help
> us understand why this problem has appeared now. Maybe there is
> some diagnostic or monitoring tool that is now becoming prevalent
> in these distributions where it triggers.
The driver seems quite happy to access the NVRAM, I think Thomas has
some backtraces that show
it clearly doing silly reentrant things...
>
> This problem started happening seemingly "all of a sudden", even to
> people who have been keeping sort-of recent with their kernels, such
> as yourself.
>
> Yet we can't get any sense yet what range of kernel versions are in
> use when the problem triggers.
I've seen it reported at least at 2.6.27-rc1 and maybe even one of
Fedora's -rc0 kernels.
Dave.
>
> I'm about to leave for a week or so in Paris for the netfilter
> workshop, so I hope that someone other than myself will do some data
> mining like I have instead of (merely) tossing theories around and
> finger pointing.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists