netdev - Re: [PATCH] xps-mq: Transmit Packet Steering for multiqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100902064136.GA8633@bx9.net>
Date:	Wed, 1 Sep 2010 23:41:36 -0700
From:	Greg Lindahl <greg@...kko.com>
To:	Stephen Hemminger <shemminger@...tta.com>
Cc:	David Miller <davem@...emloft.net>, therbert@...gle.com,
	eric.dumazet@...il.com, netdev@...r.kernel.org
Subject: Re: [PATCH] xps-mq: Transmit Packet Steering for multiqueue

On Wed, Sep 01, 2010 at 06:56:27PM -0700, Stephen Hemminger wrote:

> Just to be contrarian :-) This same idea had started before when IBM
> proposed a user-space NUMA API.  It never got any traction, the concept
> of "lets make the applications NUMA aware" never got accepted because
> it is so hard to do right and fragile that it was the wrong idea
> to start with. The only people that can manage it are the engineers
> tweeking a one off database benchmark.

As an non-database user-space example, there are many applications
which know about the typical 'first touch' locality policy for pages
and use that to be NUMA-aware. Just about every OpenMP program ever
written does that; it's even fairly portable among OSes.

A second user-level example is MPI implementations such as OpenMPI.
Those guys run 1 process per core and they don't need to move around,
so getting process locked to a core and all the pages in the right
place is a nice win without the MPI programmer doing anything.

For kernel (but non-Ethernet) networking examples, HPC interconnects
typically go out of their way to ensure locality of kernel pages
related to a given core's workload.  Examples include Myrinet's
OpenMX+MPI and the InfiniPath InfiniBand adapater, whatever QLogic
renamed it to this week (TrueScale, I suppose.) How can you get ~ 1
microsecond messages if you've got a buffer in the wrong place?  Or
achieve extremely high messaging rates when you're waiting for remote
memory all the time?

> I would rather see a "good enough" policy in the kernel that works
> for everything from a single-core embedded system to a 100 core
> server environment.

I'd like a pony. Yes, it's challenging to directly aapply the above
networking example to Ethernet networking, but there's a pony in there
somewhere.

-- greg

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html