netdev - Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALzJLG9JEorLwV48H3m0vQ1+VDnpYbKhiBoUw+WW-JFouJoj1g@mail.gmail.com>
Date:   Sat, 21 Jan 2017 22:31:26 +0200
From:   Saeed Mahameed <saeedm@....mellanox.co.il>
To:     kernel netdev <netdev@...uer.com>
Cc:     Tom Herbert <tom@...bertland.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        netdev <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...lanox.com>,
        Davem <davem@...emloft.net>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>
Subject: Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve

On Sat, Jan 21, 2017 at 9:12 PM, kernel netdev <netdev@...uer.com> wrote:
>
>
> Den 21. jan. 2017 7.10 PM skrev "Tom Herbert" <tom@...bertland.com>:
>
> On Thu, Jan 19, 2017 at 11:14 AM, Saeed Mahameed
> <saeedm@....mellanox.co.il> wrote:
>> On Thu, Jan 19, 2017 at 9:03 AM, Eric Dumazet <eric.dumazet@...il.com>
>> wrote:
>>> From: Eric Dumazet <edumazet@...gle.com>
>>>
>>> A driver using dev_alloc_page() must not reuse a page allocated from
>>> emergency memory reserve.
>>>
>>> Otherwise all packets using this page will be immediately dropped,
>>> unless for very specific sockets having SOCK_MEMALLOC bit set.
>>>
>>> This issue might be hard to debug, because only a fraction of received
>>> packets would be dropped.
>>
>> Hi Eric,
>>
>> When you say reuse, you mean point to the same page from several SKBs ?
>>
>> Because in our page cache implementation we don't reuse pages that
>> already passed to the stack,
>> we just keep them in the page cache until the ref count drop back to
>> one, so we recycle them (i,e they will be re-used only when no one
>> else is using them).
>>
> Saeed,
>
> Speaking of the mlx page cache can we remove this or a least make it
> optional to use. It is another example of complex functionality being
> put into drivers that makes things like backports more complicated and

Re complexity, I am not sure the mlx page cache is that complex,
we just wrap alloc_page/put_page with our own page cache calls.
Roughly the page cache implementation is 200-300 LOC tops all concentrated
in one place in the code.

> provide at best some marginal value. In the case of the mlx5e cache
> code the results from pktgen really weren't very impressive in the
> first place. Also, the cache suffers from HOL blocking where we can

Well, with pktgen you won't notice a huge improvement since the pages are freed
in the stack directly from our rx receive handler (gro_receive), those
pages will go back to the page allocator and get requested immediately
again from the driver (no stress).

The real improvements are seen when you really stress the page allocator with
real uses cases such as  TCP/UDP with user applications, where the
pages are held longer than the driver needs, the page cache in this
case will play an important role of reducing the stress
on the page allocater since with those use cases we are juggling with
more pages than pktgen could.

With more stress (#cores TCP streams) our humble page cache some times
can't hold ! and it will get full fast enough and will fail to recycle
for a huge percentage of the driver pages requests.

Before our own page cache we used dev_alloc_skb which used its own
cache "page_frag_cache" and it worked nice enough. So i don't really
recommend removing the page cache, until we have
a generic RX page cache solution for all device drivers.

> block the whole cache due to an outstanding reference on just one page
> (something that you wouldn't see in pktgen but is likely to happen in
> real applications).
>

Re the HOL issue, we have some upcoming patches that would drastically
improve the HOL blocking issue (we will simple swap the HOL on every
sample).

>
> (Send from phone in car)
>

Driver Safe :) ..

> To Tom, have you measured the effect of this page cache? Before claiming it
> is ineffective.
>
> My previous measurements show approx 20℅ speedup on a UDP test with delivery
> to remote CPU.
>
> Removing the cache would of cause be a good usecase for speeding up the page
> allocator (PCP). Which Mel Gorman and me are working on. AFAIK current page
> order0 cost 240 cycles. Mel have reduced til to 180, and without NUMA 150
> cycles. And with bulking this can be amortized to 80 cycles.
>

Are you trying to say that we won't need the cache if you manage to
deliver those optimizations ?
can you compare those optimizations with the page_frag_cache from
dev_alloc_skb ?

> --Jesper
>