lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1270209708.1989.30.camel@edumazet-laptop>
Date:	Fri, 02 Apr 2010 14:01:48 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Changli Gao <xiaosuo@...il.com>
Cc:	Tom Herbert <therbert@...gle.com>, davem@...emloft.net,
	netdev@...r.kernel.org
Subject: Re: [PATCH] rfs: Receive Flow Steering

Le vendredi 02 avril 2010 à 18:58 +0800, Changli Gao a écrit :

> Yes, it is more complex. Some high performance server use the
> event-driven model, such as memcached, nginx and lighttpd. This model
> has high performance on UP with no doubt, and on SMP they usually use
> one individual epoll fd for each Core/CPU, and the acceptor dispatches
> works among these epoll fds. This program model is popular, and  it
> bypass the  system scheduler. I think the socket option SO_RPSCPU can
> help this kind of applications work better, why not do that?
> Compatility with other Unixes isn't a good cause, for high performance
> applications, there are always lots of OS special features used. For
> example: epoll vs kqueue, tcp defer accept vs accept filter.
> 
> 

This dispatch things in UserLand is a poor workaround even if its
popular (because people try to code portable applications), the hard
work is already done, this increases latencies and bus traffic.

For short works, that is too expensive.

If you really want to speedup memcached/DNS_server like apps, you might
add a generic mechanism in kernel to split queues of _individual_
socket.

Aka multiqueue capabilities at socket level. Combined to multiqueue
devices or RPS, this can be great.


That is, an application tells kernel in how many queues incoming UDP
frames for a given port can be dispatched (number of worker threads)
No more contention, and this can be done regardless of RPS/RFS.

UDP frame comes in, and is stored on the appropriate sub-queue (can be a
mapping given by current cpu number). Wakeup the thread that is likely
running on same cpu.

Same for outgoing frames (answers). You might split the sk_wmemalloc
thing to make sure several cpus can concurrently use same UDP socket to
send their frames.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ