[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090416221505.GA5937@dhcp-1-124.tlv.redhat.com>
Date: Fri, 17 Apr 2009 01:15:05 +0300
From: "Michael S. Tsirkin" <m.s.tsirkin@...il.com>
To: Herbert Xu <herbert@...dor.apana.org.au>,
"David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
Rusty Russell <rusty@...tcorp.com.au>,
Max Krasnyansky <maxk@...lcomm.com>,
"Rafael J. Wysocki" <rjw@...k.pl>
Subject: Re: tun: performance regression in 2.6.30-rc1
On Fri, Apr 17, 2009 at 12:31:22AM +0300, Michael S. Tsirkin wrote:
> Hi,
> I have a simple test that sends 10K packets out of a tap device. Average time
> needed to send a packet has gone up from 2.6.29 to 2.6.30-rc1.
>
> 2.6.30-rc1:
>
> #sh runsend
> time per packet: 7570 ns
>
> 2.6.29:
>
> #git checkout v2.6.29 -- drivers/net/tun.c
> #make modules modules_install
> #rmmod tun
> #sh runsend
> time per packet: 6337 ns
>
> I note that before 2.6.29, all tun skbs would typically be linear,
> while in 2.6.30-rc1, skbs for packet size > 1 page would be paged.
> And I found this comment by Rusty (it appears in the comment for
> commit f42157cb568c1eb02eca7df4da67553a9edae24a):
>
> My original version of this patch always allocate paged skbs for big
> packets. But that made performance drop from 8.4 seconds to 8.8
> seconds on 1G lguest->Host TCP xmit. So now we only do that as a
> fallback.
>
> So just for fun, I did this:
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 37a5a04..1234d6b 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -520,7 +518,6 @@ static inline struct sk_buff *tun_alloc_skb(struct tun_struct *tun,
> int err;
>
> /* Under a page? Don't bother with paged skb. */
> - if (prepad + len < PAGE_SIZE)
> linear = len;
>
> skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
>
> This makes all skbs linear in tun. And now:
>
> 2.6.30-rc1 made linear:
> #sh runsend
> time per packet: 6611 ns
>
> Two points of interest here:
> - It seems that linear skbs are generally faster.
> Would it make sense to make tun try to use linear skbs again,
> as it did before 2.6.29?
>
> - The new code seems to introduce some measurable overhead.
> My understanding is that it's main motivation is memory
> accounting - would it make sense to create a faster code path
> for the default case where accounting is disabled?
Continuing with the investigation, commenting out
atomic_inc_not_zero and atomic_dec_and_test in tun_get/tun_put
gets us back most of the rest of the performance:
# sh runsend
time per packet: 6461 ns
I was wondering whether the socket reference counting,
which is done anyway, can be reused in some way.
Ideas?
--
MST
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists