linux-kernel - Re: [PATCH v6] x86/apic: limit irq affinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.00.0911241443110.24119@localhost.localdomain>
Date:	Tue, 24 Nov 2009 14:55:15 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Peter Zijlstra <peterz@...radead.org>
cc:	Dimitri Sivanich <sivanich@....com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Ingo Molnar <mingo@...e.hu>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Yinghai Lu <yinghai@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Jesse Barnes <jbarnes@...tuousgeek.org>,
	Arjan van de Ven <arjan@...radead.org>,
	David Miller <davem@...emloft.net>,
	Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@...el.com>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH v6] x86/apic: limit irq affinity

On Tue, 24 Nov 2009, Peter Zijlstra wrote:
> On Tue, 2009-11-24 at 14:20 +0100, Thomas Gleixner wrote:
> > On Sat, 21 Nov 2009, Dimitri Sivanich wrote:
> > 
> > > On Sat, Nov 21, 2009 at 10:49:50AM -0800, Eric W. Biederman wrote:
> > > > Dimitri Sivanich <sivanich@....com> writes:
> > > > 
> > > > > This patch allows for hard numa restrictions to irq affinity on x86 systems.
> > > > >
> > > > > Affinity is masked to allow only those cpus which the subarchitecture
> > > > > deems accessible by the given irq.
> > > > >
> > > > > On some UV systems, this domain will be limited to the nodes accessible
> > > > > to the irq's node.  Initially other X86 systems will not mask off any cpus
> > > > > so non-UV systems will remain unaffected.
> > > > 
> > > > Is this a hardware restriction you are trying to model?
> > > > If not this seems wrong.
> > > 
> > > Yes, it's a hardware restriction.
> > 
> > Nevertheless I think that this is the wrong approach.
> > 
> > What we really want is a notion in the irq descriptor which tells us:
> > this interrupt is restricted to numa node N.
> > 
> > The solution in this patch is just restricted to x86 and hides that
> > information deep in the arch code. 
> > 
> > Further the patch adds code which should be in the generic interrupt
> > management code as it is useful for other purposes as well:
> > 
> > Driver folks are looking for a way to restrict irq balancing to a
> > given numa node when they have all the driver data allocated on that
> > node. That's not a hardware restriction as in the UV case but requires
> > a similar infrastructure.
> > 
> > One possible solution would be to have a new flag:
> >  IRQF_NODE_BOUND    - irq is bound to desc->node
> > 
> > When an interrupt is set up we would query with a new irq_chip
> > function chip->get_node_affinity(irq) which would default to an empty
> > implementation returning -1. The arch code can provide its own
> > function to return the numa affinity which would express the hardware
> > restriction.
> > 
> > The core code would restrict affinity settings to the cpumask of that
> > node without any need for the arch code to check it further.
> > 
> > That same infrastructure could be used for the software restriction of
> > interrupts to a node on which the device is bound.
> > 
> > Having it in the core code also allows us to expose this information
> > to user space so that the irq balancer knows about it and does not try
> > to randomly move the affinity to cpus which are not in the allowed set
> > of the node.
> 
> I think we should not combine these two cases.
> 
> Node-bound devices simply prefer the IRQ to be routed to a cpu 'near'
> that node, hard-limiting them to that node is policy and is not
> something we should do.
> 
> Defaulting to the node-mask is debatable, but is, I think, something we
> could do. But I think we should allow user-space to write any mask as
> long as the hardware can indeed route the IRQ that way, even when
> clearly stupid.

Fair enough, but I can imagine that we want a tunable know which
prevents that. I'm not against giving sys admins enough rope to hang
themself, but at least we want to give them a helping hand to fight
off crappy user space applications which do not care about stupidity
at all.

> Which is where the UV case comes in, they cannot route IRQs to every
> CPU, so it makes sense to limit the possible masks being written. I do
> however fully agree that that should be done in generic code, as I can
> quite imagine more hardware than UV having limitations in this regard.

That's why I want to see it in the generic code.
 
> Furthermore, the /sysfs topology information should include IRQ routing
> data in this case.

Hmm, not sure about that. You'd need to scan through all the nodes to
find the set of CPUs where an irq can be routed to. I prefer to have
the information exposed by the irq enumeration (which is currently in
/proc/irq though).

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/