netdev - Re: net: Automatic IRQ siloing for network devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1303065539.5282.938.camel@localhost>
Date:	Sun, 17 Apr 2011 19:38:59 +0100
From:	Ben Hutchings <bhutchings@...arflare.com>
To:	Neil Horman <nhorman@...driver.com>
Cc:	Stephen Hemminger <shemminger@...tta.com>, netdev@...r.kernel.org,
	davem@...emloft.net, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: net: Automatic IRQ siloing for network devices

On Sun, 2011-04-17 at 13:20 -0400, Neil Horman wrote:
> On Sat, Apr 16, 2011 at 09:17:04AM -0700, Stephen Hemminger wrote:
[...]
> > My gut feeling is that:
> >   * kernel should default to a simple static sane irq policy without user
> >     space.  This is especially true for multi-queue devices where the default
> >     puts all IRQ's on one cpu.
> > 
> Thats not how it currently works, AFAICS.  The default kernel policy is
> currently that cpu affinity for any newly requested irq is all cpus.  Any
> restriction beyond that is the purview and doing of userspace (irqbalance or
> manual affinity setting).

Right.  Though it may be reasonable for the kernel to use the hint as
the initial affinity for a newly allocated IRQ (not sure quite how we
determine that).

[...]
> >   * irqbalance should not do the hacks it does to try and guess at network traffic.
> > 
> Well, I can certainly agree with that, but I'm not sure what that looks like.
> 
> I could envision something like:
> 
> 1) Use irqbalance to do a one time placement of interrupts, keeping a simple
> (possibly sub-optimal) policy, perhaps something like new irqs get assigned to
> the least loaded cpu within the numa node of the device the irq is originating
> from.
> 
> 2) Add a udev event on the addition of new interrupts, to rerun irqbalance

Yes, making irqbalance more (or entirely) event-driven seems like a good
thing.

> 3) Add some exported information to identify processes that are high users of
> network traffic, and correlate that usage to a rxq/irq that produces that
> information (possibly some per-task proc file)
> 
> 4) Create/expand an additional user space daemon to monitor the highest users of
> network traffic on various rxq/irqs (as identified in (3)) and restrict those
> processes execution to those cpus which are on the same L2 cache as the irq
> itself.  The cpuset cgroup could be usefull in doing this perhaps.

I just don't see that you're going to get processes associated with
specific RX queues unless you make use of flow steering.

The 128-entry flow hash indirection table is part of Microsoft's
requirements for RSS so most multiqueue hardware is going to let you do
limited flow steering that way.

> Actually, as I read back to myself, that acutally sounds kind of good to me.  It
> keeps all the policy for this in user space, and minimizes what we have to add
> to the kernel to make it happen (some process information in /proc and another
> udev event).  I'd like to get some feedback before I start implementing this,
> but I think this could be done.  What do you think?

I don't think it's a good idea to override the scheduler dynamically
like this.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html