[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100901185627.239ad165@nehalam>
Date: Wed, 1 Sep 2010 18:56:27 -0700
From: Stephen Hemminger <shemminger@...tta.com>
To: David Miller <davem@...emloft.net>
Cc: therbert@...gle.com, eric.dumazet@...il.com, netdev@...r.kernel.org
Subject: Re: [PATCH] xps-mq: Transmit Packet Steering for multiqueue
On Wed, 01 Sep 2010 18:32:51 -0700 (PDT)
David Miller <davem@...emloft.net> wrote:
> From: Tom Herbert <therbert@...gle.com>
> Date: Wed, 1 Sep 2010 09:24:18 -0700
>
> > On Wed, Sep 1, 2010 at 8:54 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> >> 3) Eventually have a user selectable selection (socket option, or system
> >> wide, but one sysctl, not many bitmasks ;) ).
> >>
> > Right, but it would also be nice if a single sysctl could optimally
> > set up multiqueue, RSS, RPS, and all my interrupt affinities for me
> > ;-)
>
> It's becomming increasingly obvious to me that we need (somewhere,
> not necessarily the kernel) a complete datastructure representing
> the NUMA, cache, cpu, device hierarchy.
>
> And that can be used to tweak all of this stuff.
>
> The policy should probably be in userspace, we just need to provide
> the knobs in the kernel to tweak it however userspace wants.
>
> Userspace should be able to, for example, move a TX queue into a
> NUMA domain and have this invoke several side effects:
>
> 1) IRQs for that TX queue get rerouted to a cpu in the NUMA
> domain.
>
> 2) TX queue datastructures in the driver get reallocated using
> memory in that NUMA domain.
>
> 3) TX hashing is configured to use the set of cpus in the NUMA
> domain.
>
> It's alot of tedious work and involves some delicate tasks figuring
> out where each of these things go, but really then we'd solve all
> of this crap one and for all.
Just to be contrarian :-) This same idea had started before when IBM
proposed a user-space NUMA API. It never got any traction, the concept
of "lets make the applications NUMA aware" never got accepted because
it is so hard to do right and fragile that it was the wrong idea
to start with. The only people that can manage it are the engineers
tweeking a one off database benchmark.
I would rather see a "good enough" policy in the kernel that works
for everything from a single-core embedded system to a 100 core
server environment. Forget the benchmarkers. The ideal solution
should work with a mix of traffic and adapt. Today the application
doesn't have to make a service level agreement with kernel
everytime it opens a TCP socket.
Doing it in userspace doesn't really help much. The API's keep changing
and the focus fades (see irqbalance).
--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists