netdev - Re: net: Automatic IRQ siloing for network devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1302915030.5282.778.camel@localhost>
Date:	Sat, 16 Apr 2011 01:50:30 +0100
From:	Ben Hutchings <bhutchings@...arflare.com>
To:	Neil Horman <nhorman@...driver.com>
Cc:	netdev@...r.kernel.org, davem@...emloft.net
Subject: Re: net: Automatic IRQ siloing for network devices

On Fri, 2011-04-15 at 23:54 +0100, Ben Hutchings wrote:
> On Fri, 2011-04-15 at 16:17 -0400, Neil Horman wrote:
> > Automatic IRQ siloing for network devices
> > 
> > At last years netconf:
> > http://vger.kernel.org/netconf2010.html
> > 
> > Tom Herbert gave a talk in which he outlined some of the things we can do to
> > improve scalability and througput in our network stack
> > 
> > One of the big items on the slides was the notion of siloing irqs, which is the
> > practice of setting irq affinity to a cpu or cpu set that was 'close' to the
> > process that would be consuming data.  The idea was to ensure that a hard irq
> > for a nic (and its subsequent softirq) would execute on the same cpu as the
> > process consuming the data, increasing cache hit rates and speeding up overall
> > throughput.
> > 
> > I had taken an idea away from that talk, and have finally gotten around to
> > implementing it.  One of the problems with the above approach is that its all
> > quite manual.  I.e. to properly enact this siloiong, you have to do a few things
> > by hand:
> > 
> > 1) decide which process is the heaviest user of a given rx queue 
> > 2) restrict the cpus which that task will run on
> > 3) identify the irq which the rx queue in (1) maps to
> > 4) manually set the affinity for the irq in (3) to cpus which match the cpus in
> > (2)
> [...]
> 
> This presumably works well with small numbers of flows and/or large
> numbers of queues.  You could scale it up somewhat by manipulating the
> device's flow hash indirection table, but that usually only has 128
> entries.  (Changing the indirection table is currently quite expensive,
> though that could be changed.)
[...]

Actually, I reckon you could do a more or less generic implementation of
accelerated RFS on top of a flow hash indirection table.  It would
require the drivers to provide a new function to update single table
entries, and some way to switch between automatic configuration by RFS
and manual configuration with ethtool.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html