lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 19 Apr 2010 09:28:23 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Tom Herbert <therbert@...gle.com>
Cc:	davem@...emloft.net, netdev@...r.kernel.org
Subject: Re: [PATCH RFC]: soreuseport: Bind multiple sockets to same port

Le dimanche 18 avril 2010 à 23:33 -0700, Tom Herbert a écrit :
> This is some work we've done to scale TCP listeners/UDP servers.  It
> might be apropos with some of the discussion on SO_REUSEADDR for UDP.
> ---
> This patch implements so_reuseport (SO_REUSEPORT socket option) for
> TCP and UDP.  For TCP, so_reuseport allows multiple listener sockets
> to be bound to the same port.  In the case of UDP, so_reuseport allows
> multiple sockets to bind to the same port.  To prevent port hijacking
> all sockets bound to the same port using so_reuseport must have the
> same uid.  Received packets are distributed to multiple sockets bound
> to the same port using a 4-tuple hash.
> 
> The motivating case for so_resuseport in TCP would be something like
> a web server binding to port 80 running with multiple threads, where
> each thread might have it's own listener socket.  This could be done
> as an alternative to other models: 1) have one listener thread which
> dispatches completed connections to workers. 2) accept on a single
> listener socket from multiple threads.  In case #1 the listener thread
> can easily become the bottleneck with high connection turn-over rate.
> In case #2, the proportion of connections accepted per thread tends
> to be uneven under high connection load (assuming simple event loop:
> while (1) { accept(); process() }, wakeup does not promote fairness
> among the sockets.  We have seen the  disproportion to be as high
> as 3:1 ratio between thread accepting most connections and the one
> accepting the fewest.  With so_reusport the distribution is
> uniform.
> 
> The TCP implementation has a problem in that the request sockets for a
> listener are attached to a listener socket.  If a SYN is received, a
> listener socket is chosen and request structure is created (SYN-RECV
> state).  If the subsequent ack in 3WHS does not match the same port
> by so_reusport, the connection state is not found (reset) and the
> request structure is orphaned.  This scenario would occur when the
> number of listener sockets bound to a port changes (new ones are
> added, or old ones closed).  We are looking for a solution to this,
> maybe allow multiple sockets to share the same request table...
> 
> The motivating case for so_reuseport in UDP would be something like a
> DNS server.  An alternative would be to recv on the same socket from
> multiple threads.  As in the case of TCP, the load across these threads
> tends to be disproportionate and we also see a lot of contection on
> the socket lock.  Note that SO_REUSEADDR already allows multiple UDP
> sockets to bind to the same port, however there is no provision to
> prevent hijacking and nothing to distribute packets across all the
> sockets sharing the same bound port.  This patch does not change the
> semantics of SO_REUSEADDR, but provides usable functionality of it
> for unicast.


Hmm...

I am wondering how this thing is scalable...

In fact it is not.

We live in a world with 16 cpus machines not uncommon right now.

High perf DNS server on such machine would have 16 threads, and probably
64 threads in two years.

I understand you want 16 UDP sockets to avoid lock contention, but
__udp4_lib_lookup() becomes a nightmare (It may already be ...)

My idea was to add a cpu lookup key.

thread0 would use a new setsockopt() option to bind a socket to a
virtual cpu0. Then do its normal bind( port=53)

...

threadN would use a new setsockopt() option to bind a socket to a
virtual cpuN. Then do its normal bind( port=53)

Each thread then do its normal worker loop.

Then, when receiving a frame on cpuN, we would automatically select the
right socket because its score is higher than others.


Another possibility would be to extend socket structure to be able to
have a dynamically sized queues/locks.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists