[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1318064238.5276.2.camel@edumazet-laptop>
Date: Sat, 08 Oct 2011 10:57:18 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Johannes Berg <johannes@...solutions.net>
Cc: Richard Cochran <richardcochran@...il.com>,
David Miller <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: [RFC] net: remove erroneous sk null assignment in timestamping
Le samedi 08 octobre 2011 à 10:16 +0200, Johannes Berg a écrit :
> I'm not terribly familiar with struct sock. Looking at it, I'm a bit
> confused by skb_orphan() -- it doesn't put the sock reference. So are
> sockets not refcounted for skbs in this way? They seem to use
> sock_wfree() which does a bit more than this it seems, and I don't see
> it using sk_refcnt anywhere so I'm a bit confused now.
Check following commit changelog to get some information on this.
commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
Author: Eric Dumazet <eric.dumazet@...il.com>
Date: Thu Jun 11 02:55:43 2009 -0700
net: No more expensive sock_hold()/sock_put() on each tx
One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.
This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.
We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.
We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.
As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.
skb_set_owner_w() doesnt call sock_hold() anymore
sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.
Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.
Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists