linux-kernel - Re: [PATCH v6] x86/apic: limit irq affinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m1ws1f6csh.fsf@fess.ebiederm.org>
Date:	Tue, 24 Nov 2009 09:41:18 -0800
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Arjan van de Ven <arjan@...radead.org>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Dimitri Sivanich <sivanich@....com>,
	Ingo Molnar <mingo@...e.hu>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Yinghai Lu <yinghai@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Jesse Barnes <jbarnes@...tuousgeek.org>,
	David Miller <davem@...emloft.net>,
	Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@...el.com>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH v6] x86/apic: limit irq affinity

Arjan van de Ven <arjan@...radead.org> writes:

> On Tue, 24 Nov 2009 14:55:15 +0100 (CET)
>> > Furthermore, the /sysfs topology information should include IRQ
>> > routing data in this case.
>> 
>> Hmm, not sure about that. You'd need to scan through all the nodes to
>> find the set of CPUs where an irq can be routed to. I prefer to have
>> the information exposed by the irq enumeration (which is currently in
>> /proc/irq though).
>
> yes please.
>
> one device can have multiple irqs
> one irq can be servicing multiple devices
>
> expressing that in sysfs is a nightmare, while
> sticking it in /proc/irq *where the rest of the info is* is
> much nicer for apps like irqbalance

Oii.

I don't think it is bad to export information to applications like irqbalance.

I think it pretty horrible that one of the standard ways I have heard
to improve performance on 10G nics is to kill irqbalance.

Guys.  Migrating an irq from one cpu to another while the device is
running without dropping interrupts is hard.

At the point we start talking about limiting what a process with
CAP_SYS_ADMIN can do because it makes bad decisions I think something
is really broken.

Currently the irq code treats /proc/irq/N/smp_affinity as a strong hint
on where we would like interrupts to be delivered, and we don't have good
feedback from there to architecture specific code that knows what we really
can do.  It is going to take some effort and some work to make that happen.

I think the irq scheduler is the only scheduler (except for batch jobs) that we
don't put in the kernel.  It seems to me that if we are going to go to all of the
trouble to rewrite the generic code to better support irqbalance because we
are having serious irqbalance problems, it will be less effort to suck irqbalance
into the kernel along with everything else.

I really think irqbalancing belongs in the kernel.  It is hard to
export all of the information we need to user space and the
information that we need to export keeps changing.  Until we master
this new trend of exponentially increasing core counts that
information is going to keep changing.  Today we barely know how to balance
flows across cpus.  So because of the huge communication problem and
the fact that there appears to be no benefit in keeping irqbalance in
user space (there is no config file) if we are going to rework all of the
interfaces let's pull irqbalance into the kernel.

As for the UV code, what we are looking at is a fundamental irq
routing property.  Those irqs cannot be routed to some cpus.  That is
something the code that sets up the routes needs to be aware of.
Dimitri could you put your the extra code in assign_irq_vector instead
of in the callers of assign_irq_vector?  Since the probably is not
likely to stay unique we probably want to put the information you base
things on in struct irq_desc, but the logic I seems to live best in
in assign_irq_vector.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/