lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 24 Nov 2008 12:24:53 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	"David S. Miller" <davem@...emloft.net>
CC:	Andi Kleen <andi@...stfloor.org>,
	Linux Netdev List <netdev@...r.kernel.org>,
	Corey Minyard <minyard@....org>,
	Christian Bell <christian@...i.com>
Subject: [PATCH] net: avoid a pair of dst_hold()/dst_release() in ip_append_data()

Eric Dumazet a écrit :
> Andi Kleen a écrit :
>> Eric Dumazet <dada1@...mosbay.com> writes:
>>
>>> tbench has hard time incrementing decrementing the route cache refcount
>>> shared by all communications on localhost.
>>
>> iirc there was a patch some time ago to use per CPU loopback devices 
>> to avoid this, but it was considered too much a benchmark hack.
>> As core counts increase it might stop being that though.
> 
> Well, you probably mention Stephen patch to avoid dirtying other contended
> cache lines (one napi structure per cpu)
> 
> Having multiple loopback dev would really be a hack I agree.
> 
>>
>>> On real world, we also have this problem on RTP servers sending many UDP
>>> frames to mediagateways, especially big ones handling thousand of 
>>> streams.
>>>
>>> Given that route entries are using RCU, we probably can avoid 
>>> incrementing
>>> their refcount in case of connected sockets ?
>>
>> Normally they can be hold over sleeps or queuing of skbs too, and RCU
>> doesn't handle that. To make it handle that you would need to define a
>> custom RCU period designed for this case, but this would be probably
>> tricky and fragile: especially I'm not sure even if you had a "any
>> packet queued" RCU method it be guaranteed to always finish because 
>> there is no fixed upper livetime of a packet.
>>
>> The other issue is that on preemptible kernels you would need to 
>> disable preemption all the time such a routing entry is hold, which
>> could be potentially quite long.
>>
> 
> Well, in case of UDP, we call ip_push_pending_frames() and this one
> does the increment of refcount (again). I was not considering
> avoiding the refcount hold we do when queing a skb in transmit
> queue, only during a short period of time. Oh well, ip_append_data()
> might sleep, so this cannot work...
> 
> I agree avoiding one refcount increment/decrement is probably
> not a huge gain, considering we *have* to do the increment,
> but when many cpus are using UDP send/receive in //, this might
> show a gain somehow.
> 
> So maybe we could make ip_append_data() (or its callers) a
> litle bit smarter, avoiding increment/decrement if possible.

Here is a patch to remove one dst_hold()/dst_release() pair
in UDP/RAW transmit path.

[PATCH] net: avoid a pair of dst_hold()/dst_release() in ip_append_data()

We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.

This patch makes ip_append_data() eventually steal the refcount its
callers had to take on the dst entry.

This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path

Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
---
 include/net/ip.h     |    2 +-
 net/ipv4/icmp.c      |    8 ++++----
 net/ipv4/ip_output.c |   11 ++++++++---
 net/ipv4/raw.c       |    2 +-
 net/ipv4/udp.c       |    2 +-
 5 files changed, 15 insertions(+), 10 deletions(-)

View attachment "ip_append_data.patch" of type "text/plain" (4257 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ