[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <544042D5.5090501@redhat.com>
Date: Thu, 16 Oct 2014 15:12:37 -0700
From: Alexander Duyck <alexander.h.duyck@...hat.com>
To: Eric Dumazet <eric.dumazet@...il.com>
CC: "Jiafei.Pan@...escale.com" <Jiafei.Pan@...escale.com>,
David Miller <davem@...emloft.net>,
"jkosina@...e.cz" <jkosina@...e.cz>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"LeoLi@...escale.com" <LeoLi@...escale.com>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb
On 10/16/2014 02:40 PM, Eric Dumazet wrote:
> On Thu, 2014-10-16 at 11:20 -0700, Alexander Duyck wrote:
>
>> My concern would be that we are off by a factor of 2 and prematurely
>> collapse the TCP too soon with this change.
> That is the opposite actually. We can consume 4K but we pretend we
> consume 2K in some worst cases.
The only case where we consume the full 4K but only list it as 2K should
be if we have memory from the wrong node and we want to flush it from
the descriptor queue. For all other cases we should be using the page
at least twice per buffer. So the the first page that was assigned for
an Rx descriptor might be flushed but then after that reuse should take
hold and stay in place as long as the NAPI poll doesn't change NUMA nodes.
That should be no worse than the case where the remaining space in a
large page is not large enough to use as a buffer. You still use the
current size as your truesize, you don't include the overhead of the
unused space in your calculation.
>> For example if you are
>> looking at a socket that is holding pages for a long period of time
>> there would be a good chance of it ending up with both halves of the
>> page. In this case is it fair to charge it for 8K or memory use when in
>> reality it is only using 4K?
> Its better to collapse too soon than too late.
>
> If you want to avoid collapses because one host has plenty of memory,
> all you need to do is increase tcp_rmem.
>
> Why are you referring to 8K ? PAGE_SIZE is 4K
The truesize would be reported as 8K vs 4K for 2 half pages with your
change if we were to hand off both halves of a page to the same socket.
The 2K value makes sense and is consistent with how we handle this in
other cases where we are partitioning pages for use as network buffers.
I think increasing this to 4K is just going to cause performance issues
as flows are going to get choked off prematurely for memory usage that
they aren't actually getting.
Part of my hesitation is that I spent the last couple of years
explaining to our performance testing team and customers that they need
to adjust tcp_rmem with all of the changes that have been made to
truesize and the base network drivers, and I think I would prefer it if
I didn't have to go another round of it. Then again I probably won't
have to anyway since I am not doing drivers for Intel any more, but
still my reaction to this kind of change is what it is.
Thanks,
Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists