[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1284673961.2283.57.camel@achroite.uk.solarflarecom.com>
Date: Thu, 16 Sep 2010 22:52:41 +0100
From: Ben Hutchings <bhutchings@...arflare.com>
To: David Miller <davem@...emloft.net>
Cc: therbert@...gle.com, eric.dumazet@...il.com, shemminger@...tta.com,
netdev@...r.kernel.org
Subject: Re: [PATCH] xps-mq: Transmit Packet Steering for multiqueue
On Wed, 2010-09-01 at 18:32 -0700, David Miller wrote:
> From: Tom Herbert <therbert@...gle.com>
> Date: Wed, 1 Sep 2010 09:24:18 -0700
>
> > On Wed, Sep 1, 2010 at 8:54 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> >> 3) Eventually have a user selectable selection (socket option, or system
> >> wide, but one sysctl, not many bitmasks ;) ).
> >>
> > Right, but it would also be nice if a single sysctl could optimally
> > set up multiqueue, RSS, RPS, and all my interrupt affinities for me
> > ;-)
>
> It's becomming increasingly obvious to me that we need (somewhere,
> not necessarily the kernel) a complete datastructure representing
> the NUMA, cache, cpu, device hierarchy.
And ideally a cheap way (not O(N^2)) to find the distance between 2 CPU
threads (not just nodes).
> And that can be used to tweak all of this stuff.
>
> The policy should probably be in userspace, we just need to provide
> the knobs in the kernel to tweak it however userspace wants.
>
> Userspace should be able to, for example, move a TX queue into a
> NUMA domain and have this invoke several side effects:
>
> 1) IRQs for that TX queue get rerouted to a cpu in the NUMA
> domain.
>
> 2) TX queue datastructures in the driver get reallocated using
> memory in that NUMA domain.
I've actually done some work on an interface and implementation of this,
although I didn't include actually setting the IRQ affinity as there has
been pushback whenever people propose letting drivers set this. If they
only do so as directed by the administrator this might be more
acceptable though.
Unfortunately in my limited testing on a 2-node system I didn't see a
whole lot of improvement in performance when the affinities were all
lined up. I should try to get some time on a 4-node system.
> 3) TX hashing is configured to use the set of cpus in the NUMA
> domain.
>
> It's alot of tedious work and involves some delicate tasks figuring
> out where each of these things go, but really then we'd solve all
> of this crap one and for all.
Right.
The other thing I've been working on lately which sort of ties into this
is hardware acceleration of Receive Flow Steering. Multiqueue NICs such
as ours tend to have RX flow filters as well as hashing. So why not use
those to do a first level of steering? We're going to do some more
internal testing and review but I hope to send out a first version of
this next week.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists