[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1303065539.5282.938.camel@localhost>
Date: Sun, 17 Apr 2011 19:38:59 +0100
From: Ben Hutchings <bhutchings@...arflare.com>
To: Neil Horman <nhorman@...driver.com>
Cc: Stephen Hemminger <shemminger@...tta.com>, netdev@...r.kernel.org,
davem@...emloft.net, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: net: Automatic IRQ siloing for network devices
On Sun, 2011-04-17 at 13:20 -0400, Neil Horman wrote:
> On Sat, Apr 16, 2011 at 09:17:04AM -0700, Stephen Hemminger wrote:
[...]
> > My gut feeling is that:
> > * kernel should default to a simple static sane irq policy without user
> > space. This is especially true for multi-queue devices where the default
> > puts all IRQ's on one cpu.
> >
> Thats not how it currently works, AFAICS. The default kernel policy is
> currently that cpu affinity for any newly requested irq is all cpus. Any
> restriction beyond that is the purview and doing of userspace (irqbalance or
> manual affinity setting).
Right. Though it may be reasonable for the kernel to use the hint as
the initial affinity for a newly allocated IRQ (not sure quite how we
determine that).
[...]
> > * irqbalance should not do the hacks it does to try and guess at network traffic.
> >
> Well, I can certainly agree with that, but I'm not sure what that looks like.
>
> I could envision something like:
>
> 1) Use irqbalance to do a one time placement of interrupts, keeping a simple
> (possibly sub-optimal) policy, perhaps something like new irqs get assigned to
> the least loaded cpu within the numa node of the device the irq is originating
> from.
>
> 2) Add a udev event on the addition of new interrupts, to rerun irqbalance
Yes, making irqbalance more (or entirely) event-driven seems like a good
thing.
> 3) Add some exported information to identify processes that are high users of
> network traffic, and correlate that usage to a rxq/irq that produces that
> information (possibly some per-task proc file)
>
> 4) Create/expand an additional user space daemon to monitor the highest users of
> network traffic on various rxq/irqs (as identified in (3)) and restrict those
> processes execution to those cpus which are on the same L2 cache as the irq
> itself. The cpuset cgroup could be usefull in doing this perhaps.
I just don't see that you're going to get processes associated with
specific RX queues unless you make use of flow steering.
The 128-entry flow hash indirection table is part of Microsoft's
requirements for RSS so most multiqueue hardware is going to let you do
limited flow steering that way.
> Actually, as I read back to myself, that acutally sounds kind of good to me. It
> keeps all the policy for this in user space, and minimizes what we have to add
> to the kernel to make it happen (some process information in /proc and another
> udev event). I'd like to get some feedback before I start implementing this,
> but I think this could be done. What do you think?
I don't think it's a good idea to override the scheduler dynamically
like this.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists