lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 23 May 2012 10:57:49 -0700
From:	Alexander Duyck <alexander.h.duyck@...el.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	Kieran Mansley <kmansley@...arflare.com>,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
	Ben Hutchings <bhutchings@...arflare.com>,
	netdev@...r.kernel.org
Subject: Re: TCPBacklogDrops during aggressive bursts of traffic

On 05/23/2012 10:24 AM, Eric Dumazet wrote:
> On Wed, 2012-05-23 at 09:58 -0700, Alexander Duyck wrote:
>
>> Right, but the problem is that in order to make this work the we are
>> dropping the padding for head and hoping to have room for shared info. 
>> This is going to kill performance for things like routing workloads
>> since the entire head is going to have to be copied over to make space
>> for NET_SKB_PAD. 
> Hey I said that one of the point I have to add to my patch. Please read
> it again ;)
I'm aware of that, but still it seems like we are getting ahead of
ourselves.  This fix is so specific to just the socket backlog case that
I think we are missing the fact that it is going to have huge
performance repercussions elsewhere.

> By the way, we can also add code doing the ksb->head upgrade to fragment
> again in case we need to add a tunnel header, instead of full copy.
>
> So maybe the NET_SKB_PAD is not really needed anymore.
>
> Anyway, a router host could use a different allocation strategy (going
> back to current one)
The thing I don't like is that we are adding extra memcpy calls to all
of these paths.  We cannot change the head without having to copy the
shared info and there is going to be a cost for that.  I would prefer to
only generate the shared info once and not have to relocate it every
time I want to run a router or tunnel.

>>  Also this assumes no RSC being enabled.  RSC is
>> normally enabled by default.  If it is turned on we are going to start
>> receiving full 2K buffers which will cause even more issues since there
>> wouldn't be any room for shared info in the 2K frame.
>>
> Hey his is one of the point I have to address, also mentioned.
>
> Its almost trivial to check len (if we have room for shared info, take
> it, if not allocate the head as before)
I know you mentioned this as well.  The thing is I would prefer not to
add code where we are branching in so many different directions on what
the header actually looks like.  It tends to open a lot of opportunities
for bugs when someone makes a change and doesn't take one of the
possible head and fragment combinations into account.

>> The way the driver is currently written probably provides the optimal
>> setup for truesize given the circumstances.
> It unfortunate the hardware has 1KB granularity.
Agreed, I would have preferred 512B granularity, but we are locked in at
1K for now..  :-/

> Problem is skb_try_coalesce() is not used when we store packets in
> socket backlog, and only used for TCP at this moment.
One piece of low hanging fruit that is available to help with some of
this is to drop the Rx header size for ixgbe to 256.  That should at
least cut the total truesize for the buffer and sk_buff to 960 or so
which is at least a step in the right direction.

Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ