| lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
|
Open Source and information security mailing list archives
| ||
|
Message-ID: <1466461744.6850.29.camel@edumazet-glaptop3.roam.corp.google.com> Date: Mon, 20 Jun 2016 15:29:04 -0700 From: Eric Dumazet <eric.dumazet@...il.com> To: Jason Baron <jbaron@...mai.com> Cc: davem@...emloft.net, netdev@...r.kernel.org Subject: Re: [PATCH net-next] tcp: reduce cpu usage when SO_SNDBUF is set On Mon, 2016-06-20 at 17:23 -0400, Jason Baron wrote: > From: Jason Baron <jbaron@...mai.com> > > When SO_SNDBUF is set and we are under tcp memory pressure, the effective > write buffer space can be much lower than what was set using SO_SNDBUF. For > example, we may have set the buffer to 100kb, but we may only be able to > write 10kb. In this scenario poll()/select()/epoll(), are going to > continuously return POLLOUT, followed by -EAGAIN from write(), and thus > result in a tight loop. Note that epoll in edge-triggered does not have > this issue since it only triggers once data has been ack'd. There is no > issue here when SO_SNDBUF is not set, since the tcp layer will auto tune > the sk->sndbuf. > > Introduce a new socket flag, SOCK_SHORT_WRITE, such that we can mark the > socket when we have a short write due to memory pressure. By then testing > for SOCK_SHORT_WRITE in tcp_poll(), we hold off the POLLOUT until a > non-zero amount of data has been ack'd. In a previous approach: > http://marc.info/?l=linux-netdev&m=143930393211782&w=2, I had introduced a > new field in 'struct sock' to solve this issue, but its undesirable to add > bloat to 'struct sock'. We also could address this issue, by waiting for > the buffer to become completely empty, but that may reduce throughput since > the write buffer would be empty while waiting for subsequent writes. This > change brings us in line with the current epoll edge-trigger behavior for > the poll()/select() and epoll level-trigger. > > We guarantee that SOCK_SHORT_WRITE will eventually clear, since when we set > SOCK_SHORT_WRITE, we make sure that sk_wmem_queued is non-zero and > SOCK_NOSPACE is set as well (in sk_stream_wait_memory()). > > I tested this patch using 10,000 sockets, and setting SO_SNDBUF on the > server side, to induce tcp memory pressure. A single server thread reduced > its cpu usage from 100% to 19%, while maintaining the same level of > throughput. > > Signed-off-by: Jason Baron <jbaron@...mai.com> > --- When this bug was added, and by which commit ? This looks serious, but your patch looks way too complex. This SOCK_SHORT_WRITE bit constantly cleared in sk_stream_write_space() is expensive, because it dirties sk_flags which is not supposed to written often. This field is located in a read mostly cache line. I believe my suggestion was not to add a per socket bit, but have a global state like tcp_memory_pressure. (or reuse it) Ie , when under global memory pressure, tcp_poll() should apply a strategy to give POLLOUT only on sockets having less than 4K (tcp_wmem[0]) of memory consumed in their write queue.
Powered by blists - more mailing lists