lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20071127131355.GA14887@hmsendeavour.rdu.redhat.com>
Date:	Tue, 27 Nov 2007 08:13:55 -0500
From:	Neil Horman <nhorman@...hat.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
Cc:	Neil Horman <nhorman@...driver.com>, hbabu@...ibm.com,
	vgoyal@...ibm.com, kexec@...ts.infradead.org, ak@...e.de,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu

On Mon, Nov 26, 2007 at 09:12:25PM -0700, Eric W. Biederman wrote:
> Neil Horman <nhorman@...driver.com> writes:
> 
> > Hey all-
> > 	I've been working on an issue lately involving multi socket x86_64
> > systems connected via hypertransport bridges.  It appears that some systems,
> > disable the hypertransport connections during a kdump operation when all but the
> > crashing processor gets halted in machine_crash_shutdown.  This becomes a
> > problem when the ioapic attempts to route interrupts to the only remaining
> > processor.  Even though the active processor is targeted for interrupt
> > reception, the fact that the hypertransport connections are inactive result in
> > interrupts not getting delivered.  The effective result is that timer interrupts
> > are not delivered to the running cpu, and the system hangs on reboot into the
> > kdump kernel during calibrate_delay.  I've found that I've been able to avoid
> > this hang, by forcing a transition to the bios defined boot cpu during the
> > crashing kernel shutdown.  This patch accomplished that.  Tested by myself and
> > the origional reporter with successful results.
> 
> If you can get to calibrate_delay hypertransport is still routing traffic.
> Your diagnosis of the problem is wrong.  Most likely it is just an ioapic
> programming error in restoring the system to PIC mode.
> 
What makes you say this?  I don't see any need for interrupts prior to
calibrate_delay()

> I agree that there is a problem.
> 
> The reliable fix is to totally skip the PIC interrupt mode and go directly
> to apic mode.
> 
> To make the code kexec on panic code path reliable we need to remove code
> not add it.
> 
> Frankly I think switching cpus is one of the least reliable things that
> we can do in general.
> 
I understand the sentiment here, but its not like we're adding additional
functionality with this patch.  We're already sending an IPI to all the
processors to halt them, we're just adding logic here so that we can detect the
boot cpu and use it to jump to the kexec image instead of halting.  I don't
think this is any less reliable that what we have currently.

Regards
Neil

> Eric
> 
> _______________________________________________
> kexec mailing list
> kexec@...ts.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

-- 
/***************************************************
 *Neil Horman
 *Software Engineer
 *Red Hat, Inc.
 *nhorman@...hat.com
 *gpg keyid: 1024D / 0x92A74FA1
 *http://pgp.mit.edu
 ***************************************************/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ