[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <200901161347.14212.rjw@sisk.pl>
Date: Fri, 16 Jan 2009 13:47:13 +0100
From: "Rafael J. Wysocki" <rjw@...k.pl>
To: Yinghai Lu <yinghai@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Cc: "Barnes, Jesse" <jesse.barnes@...el.com>,
Ingo Molnar <mingo@...e.hu>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: kexec fail with e1000 and e1000e
On Friday 16 January 2009, Rafael J. Wysocki wrote:
> On Friday 16 January 2009, Yinghai Lu wrote:
> > On Thu, Jan 15, 2009 at 8:31 PM, Yinghai Lu <yinghai@...nel.org> wrote:
> > > [ 8.652468] Intel(R) PRO/1000 Network Driver - version 7.3.20-k3-NAPI
> > > [ 8.658896] Copyright (c) 1999-2006 Intel Corporation.
> > > [ 8.680008] e1000 0000:05:01.0: enabling device (0000 -> 0003)
> > > [ 8.685831] vendor=1022 device=7458
> > > [ 8.689314] e1000 0000:05:01.0: PCI INT A -> GSI 48 (level, low) -> IRQ 48
> > > [ 8.696184] e1000 0000:05:01.0: enabling bus mastering
> > > [ 8.953642] e1000: 0000:05:01.0: e1000_probe: The EEPROM Checksum
> > > Is Not Valid
> > > [ 9.863340] /*********************/
> > > [ 9.866829] Current EEPROM Checksum : 0xffff
> > > [ 9.871092] Calculated : 0xbaf9
> > > [ 9.875354] Offset Values
> > > [ 9.878230] ======== ======
> > > [ 9.881107] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > [ 9.887622] 00000010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > [ 9.894137] 00000020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > [ 9.900653] 00000030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > [ 9.907168] 00000040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > [ 9.913684] 00000050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > [ 9.920200] 00000060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > [ 9.926715] 00000070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > [ 9.933231] Include this output when contacting your support provider.
> > > [ 9.939746] This is not a software error! Something bad happened to
> > > your hardware or
> > > [ 9.947475] EEPROM image. Ignoring this problem could result in
> > > further problems,
> > > [ 9.954945] possibly loss of data, corruption or system hangs!
> > > [ 9.960767] The MAC Address will be reset to 00:00:00:00:00:00,
> > > which is invalid
> > > [ 9.968149] and requires you to set the proper MAC address manually
> > > before continuing
> > > [ 9.975964] to enable this network device.
> > > [ 9.980053] Please inspect the EEPROM dump and report the issue to
> > > your hardware vendor
> > > [ 9.988041] or Intel Customer Support.
> > > [ 9.991770] /*********************/
> > >
> > > git bisect said following patch is first bad one.
> > >
> > > commit 98e6e286d7b01deb7453b717aa38ebb69d6cefc0
> > > Author: Rafael J. Wysocki <rjw@...k.pl>
> > > Date: Wed Jan 7 13:10:35 2009 +0100
> > >
> > > PCI PM: Register power state of devices during initialization
> > >
> > > Use the observation that the power state of a PCI device can be
> > > loaded into its pci_dev structure as soon as pci_pm_init() is run for
> > > it and make that happen.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rjw@...k.pl>
> > > Acked-by: Pavel Machek <pavel@...e.cz>
> > > Signed-off-by: Jesse Barnes <jbarnes@...tuousgeek.org>
> >
> > reverting that patch will make kexec work with e1000 again.
>
> Well, although I think that reverting the patch won't make much harm, I also
> don't quite understand why this should matter for kexec at all.
>
> Can you please check if the problem also happens with the appended debug patch?
OK, scratch that, I think I know what happens.
shutdown() puts the device into D3 and then kexec relies on the device's
current_state being PCI_UNKNOWN so that pci_restore_bars() is called when
pci_set_power_state() runs for the first time. This looks fragile, but I don't
know how to fix it cleanly at the moment and I can see this can also happen
in some other cases. My bad.
Linus, please revert commit 98e6e286d7b01deb7453b717aa38ebb69d6cefc0
(PCI PM: Register power state of devices during initialization), because it
breaks the mechanism by which pci_restore_bars() is called during
initialization if the device is initially in D3.
Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists