netdev - Re: [RFC] Could we avoid touching dst->refcount in some cases ?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <492A7E85.3060502@cosmosbay.com>
Date:	Mon, 24 Nov 2008 11:14:29 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Andi Kleen <andi@...stfloor.org>
CC:	Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: [RFC] Could we avoid touching dst->refcount in some cases ?

Andi Kleen a écrit :
> Eric Dumazet <dada1@...mosbay.com> writes:
> 
>> tbench has hard time incrementing decrementing the route cache refcount
>> shared by all communications on localhost.
> 
> iirc there was a patch some time ago to use per CPU loopback devices to 
> avoid this, but it was considered too much a benchmark hack.
> As core counts increase it might stop being that though.

Well, you probably mention Stephen patch to avoid dirtying other contended
cache lines (one napi structure per cpu)

Having multiple loopback dev would really be a hack I agree.

> 
>> On real world, we also have this problem on RTP servers sending many UDP
>> frames to mediagateways, especially big ones handling thousand of streams.
>>
>> Given that route entries are using RCU, we probably can avoid incrementing
>> their refcount in case of connected sockets ?
> 
> Normally they can be hold over sleeps or queuing of skbs too, and RCU
> doesn't handle that. To make it handle that you would need to define a
> custom RCU period designed for this case, but this would be probably
> tricky and fragile: especially I'm not sure even if you had a "any
> packet queued" RCU method it be guaranteed to always finish 
> because there is no fixed upper livetime of a packet.
> 
> The other issue is that on preemptible kernels you would need to 
> disable preemption all the time such a routing entry is hold, which
> could be potentially quite long.
> 

Well, in case of UDP, we call ip_push_pending_frames() and this one
does the increment of refcount (again). I was not considering
avoiding the refcount hold we do when queing a skb in transmit
queue, only during a short period of time. Oh well, ip_append_data()
might sleep, so this cannot work...

I agree avoiding one refcount increment/decrement is probably
not a huge gain, considering we *have* to do the increment,
but when many cpus are using UDP send/receive in //, this might
show a gain somehow.

So maybe we could make ip_append_data() (or its callers) a
litle bit smarter, avoiding increment/decrement if possible.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html