linux-kernel - Re: [PATCH 2.6.21 review I] [11/25] x86: default to physical mode on hotplug CPU kernels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m1k5ynug3v.fsf@ebiederm.dsl.xmission.com>
Date:	Mon, 12 Feb 2007 16:10:44 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Andi Kleen <andi@...stfloor.org>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	"Li, Shaohua" <shaohua.li@...el.com>, patches@...-64.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2.6.21 review I] [11/25] x86: default to physical mode on hotplug CPU kernels

Andi Kleen <andi@...stfloor.org> writes:

> What experimental evidence did you have? 
>
> But I'm tempted to drop this unless the hotplug mystery can be cleared
> up. There was past information that logical is unsafe for hotplug.

Basically as I commented in genapic_flat, that at least on hyperthreading
cpus the destination mask is not always honored, and so if you only
allow one hyperthread I have seen the irq show up on the other hyperthread.

Now if the cpu is actually disabled I don't think that is a problem, but
I know early versions of hotplug did actually disable the cpu.

I think the renaming in this patch makes things clearer in a useful way.

The more I look at the hotplug cpu code the more I think it should be
filed under EXPERIMENTAL (as in the code is buggy and not ready for
production use yet).

Trying to see if I can improve the irq migration mess I stumbled upon
the following.  Currently set_affinity needs to be called with
irq_desc[irq].lock held.  It needs to be called from interrupt context.
And 1 millisecond delay appears utterly bogus, although the enable
the irqs do something disable the irqs likely flushes pending irqs.
A problem that cannot occurs differently if the irqs are migrated
from interrupt context.

Looking further this buggy set_affinity usage also appears in
setup_ioapic_dest.  Although it is much less dangerous there,
as the code is largely a noop.

Is it just me or are we crazy for support software controlled irq
migration?

void fixup_irqs(cpumask_t map)
{
	unsigned int irq;
	static int warned;

	for (irq = 0; irq < NR_IRQS; irq++) {
		cpumask_t mask;
		if (irq == 2)
			continue;

		cpus_and(mask, irq_desc[irq].affinity, map);
		if (any_online_cpu(mask) == NR_CPUS) {
			printk("Breaking affinity for irq %i\n", irq);
			mask = map;
		}
		if (irq_desc[irq].chip->set_affinity)
			irq_desc[irq].chip->set_affinity(irq, mask);
		else if (irq_desc[irq].action && !(warned++))
			printk("Cannot set affinity for irq %i\n", irq);
	}

	/* That doesn't seem sufficient.  Give it 1ms. */
	local_irq_enable();
	mdelay(1);
	local_irq_disable();
}

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/