lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 19 Apr 2010 09:28:23 +0200 From: Eric Dumazet <eric.dumazet@...il.com> To: Tom Herbert <therbert@...gle.com> Cc: davem@...emloft.net, netdev@...r.kernel.org Subject: Re: [PATCH RFC]: soreuseport: Bind multiple sockets to same port Le dimanche 18 avril 2010 à 23:33 -0700, Tom Herbert a écrit : > This is some work we've done to scale TCP listeners/UDP servers. It > might be apropos with some of the discussion on SO_REUSEADDR for UDP. > --- > This patch implements so_reuseport (SO_REUSEPORT socket option) for > TCP and UDP. For TCP, so_reuseport allows multiple listener sockets > to be bound to the same port. In the case of UDP, so_reuseport allows > multiple sockets to bind to the same port. To prevent port hijacking > all sockets bound to the same port using so_reuseport must have the > same uid. Received packets are distributed to multiple sockets bound > to the same port using a 4-tuple hash. > > The motivating case for so_resuseport in TCP would be something like > a web server binding to port 80 running with multiple threads, where > each thread might have it's own listener socket. This could be done > as an alternative to other models: 1) have one listener thread which > dispatches completed connections to workers. 2) accept on a single > listener socket from multiple threads. In case #1 the listener thread > can easily become the bottleneck with high connection turn-over rate. > In case #2, the proportion of connections accepted per thread tends > to be uneven under high connection load (assuming simple event loop: > while (1) { accept(); process() }, wakeup does not promote fairness > among the sockets. We have seen the disproportion to be as high > as 3:1 ratio between thread accepting most connections and the one > accepting the fewest. With so_reusport the distribution is > uniform. > > The TCP implementation has a problem in that the request sockets for a > listener are attached to a listener socket. If a SYN is received, a > listener socket is chosen and request structure is created (SYN-RECV > state). If the subsequent ack in 3WHS does not match the same port > by so_reusport, the connection state is not found (reset) and the > request structure is orphaned. This scenario would occur when the > number of listener sockets bound to a port changes (new ones are > added, or old ones closed). We are looking for a solution to this, > maybe allow multiple sockets to share the same request table... > > The motivating case for so_reuseport in UDP would be something like a > DNS server. An alternative would be to recv on the same socket from > multiple threads. As in the case of TCP, the load across these threads > tends to be disproportionate and we also see a lot of contection on > the socket lock. Note that SO_REUSEADDR already allows multiple UDP > sockets to bind to the same port, however there is no provision to > prevent hijacking and nothing to distribute packets across all the > sockets sharing the same bound port. This patch does not change the > semantics of SO_REUSEADDR, but provides usable functionality of it > for unicast. Hmm... I am wondering how this thing is scalable... In fact it is not. We live in a world with 16 cpus machines not uncommon right now. High perf DNS server on such machine would have 16 threads, and probably 64 threads in two years. I understand you want 16 UDP sockets to avoid lock contention, but __udp4_lib_lookup() becomes a nightmare (It may already be ...) My idea was to add a cpu lookup key. thread0 would use a new setsockopt() option to bind a socket to a virtual cpu0. Then do its normal bind( port=53) ... threadN would use a new setsockopt() option to bind a socket to a virtual cpuN. Then do its normal bind( port=53) Each thread then do its normal worker loop. Then, when receiving a frame on cpuN, we would automatically select the right socket because its score is higher than others. Another possibility would be to extend socket structure to be able to have a dynamically sized queues/locks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists