[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1eina9vw1.fsf@fess.ebiederm.org>
Date: Fri, 04 Dec 2009 15:12:14 -0800
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@...el.com>
Cc: Dimitri Sivanich <sivanich@....com>,
Arjan van de Ven <arjan@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...e.hu>,
"Siddha\, Suresh B" <suresh.b.siddha@...el.com>,
Yinghai Lu <yinghai@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Jesse Barnes <jbarnes@...tuousgeek.org>,
David Miller <davem@...emloft.net>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH v6] x86/apic: limit irq affinity
Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@...el.com> writes:
>
>> >
>> > >
>> > > Also, can we add a restricted mask as I mention above into this scheme? If we can't send an IRQ to some node, we don't want to bother attempting to change affinity to cpus on that node (hopefully code in the kernel will eventually restrict this).
>> > >
>> >
>> > The interface allows you to put in any CPU mask. The way it's written
>> > now, whatever mask you put in, irqbalance *only* balances within that
>> > mask. It won't ever try and go outside that mask.
>>
>> OK. Given that, it might be nice to combine the restricted cpus that I'm describing with your node_affinity mask, but we could expose them as separate masks (node_affinity and restricted_affinity, as I describe above).
>>
>
> I think this might be getting too complicated. The only thing
> irqbalance is lacking today, in my mind, is the feedback mechanism,
> telling it what subset of CPU masks to balance within.
You mean besides knowing that devices can have more than one irq?
You mean besides making good on it's promise not to move networking
irqs? A policy of BALANCE_CORE sure doesn't look like a policy of
don't touch.
You mean besides realizing that irqs can only be directed at one cpu on
x86? At least when you have more than 8 logical cores in the system, the
cases that matter.
> There is a
> allowed_mask, but that is used for a different purpose. Hence why I
> added another. But I think your needs can be met 100% with what I have
> already, and we can come up with a different name that's more generic.
> The flows would be something like this:
Two masks? You are asking the kernel to move irqs for you then?
> Driver:
> - Driver comes online, allocates memory in a sensible NUMA fashion
> - Driver requests kernel for interrupts, ties them into handlers
> - Driver now sets a NUMA-friendly affinity for each interrupt, to match
> with its initial memory allocation
> - irqbalance balances interrupts within their new "hinted" affinities.
>
> Other:
> - System comes online
> - In your case, interrupts must be kept away from certain CPUs.
> - Some mechanism in your architecture init can set the "hinted" affinity
> mask for each interrupt.
> - irqbalance will not move interrupts to the CPUs you left out of the
> "hinted" affinity.
>
> Does this make more sense?
>> > > As a matter of fact, driver's allocating rings, buffers, queues on other nodes should optimally be made aware of the restriction.
>> >
>> > The idea is that the driver will do its memory allocations for everything
>> > across nodes. When it does that, it will use the kernel interface
>> > (function call) to set the corresponding mask it wants for those queue
>> > resources. That is my end-goal for this code.
>> >
>>
>> OK, but we will eventually have to reject any irqbalance attempts to send irqs to restricted nodes.
>
> See above.
Either I am parsing this conversation wrong or there is a strong
reality distortion field in place. It appears you are asking that we
depend on a user space application to not attempt the physically
impossible, when we could just as easily ignore or report -EINVAL to.
We really have two separate problems hear.
- How to avoid the impossible.
- How to deal with NUMA affinity.
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists