netdev - Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <CALx6S369rCbKYh3aLGp4G1GLwG456yqnX-a3fnG+vAKgvmS8vw@mail.gmail.com>
Date:   Sun, 22 Jan 2017 10:09:23 -0800
From:   Tom Herbert <tom@...bertland.com>
To:     kernel netdev <netdev@...uer.com>
Cc:     Saeed Mahameed <saeedm@...lanox.com>,
        netdev <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...lanox.com>,
        Davem <davem@...emloft.net>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Saeed Mahameed <saeedm@....mellanox.co.il>,
        Jesper Dangaard Brouer <brouer@...hat.com>
Subject: Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve

On Sat, Jan 21, 2017 at 11:12 AM, kernel netdev <netdev@...uer.com> wrote:
>
>
> Den 21. jan. 2017 7.10 PM skrev "Tom Herbert" <tom@...bertland.com>:
>
> On Thu, Jan 19, 2017 at 11:14 AM, Saeed Mahameed
> <saeedm@....mellanox.co.il> wrote:
>> On Thu, Jan 19, 2017 at 9:03 AM, Eric Dumazet <eric.dumazet@...il.com>
>> wrote:
>>> From: Eric Dumazet <edumazet@...gle.com>
>>>
>>> A driver using dev_alloc_page() must not reuse a page allocated from
>>> emergency memory reserve.
>>>
>>> Otherwise all packets using this page will be immediately dropped,
>>> unless for very specific sockets having SOCK_MEMALLOC bit set.
>>>
>>> This issue might be hard to debug, because only a fraction of received
>>> packets would be dropped.
>>
>> Hi Eric,
>>
>> When you say reuse, you mean point to the same page from several SKBs ?
>>
>> Because in our page cache implementation we don't reuse pages that
>> already passed to the stack,
>> we just keep them in the page cache until the ref count drop back to
>> one, so we recycle them (i,e they will be re-used only when no one
>> else is using them).
>>
> Saeed,
>
> Speaking of the mlx page cache can we remove this or a least make it
> optional to use. It is another example of complex functionality being
> put into drivers that makes things like backports more complicated and
> provide at best some marginal value. In the case of the mlx5e cache
> code the results from pktgen really weren't very impressive in the
> first place. Also, the cache suffers from HOL blocking where we can
> block the whole cache due to an outstanding reference on just one page
> (something that you wouldn't see in pktgen but is likely to happen in
> real applications).
>
>
> (Send from phone in car)
>
> To Tom, have you measured the effect of this page cache? Before claiming it
> is ineffective.

No, I have not. TBH, I have most of the past few weeks trying to debug
a backport of the code from 4.9 to 4.6. Until we have a working
backport performance is immaterial for our purposes. Unfortunately, we
are seeing some issues: the checksum faults I described previously and
crashes on bad page refcns which are presumably being caused by the
logic in RX buffer processing. This is why I am now having to dissect
the code and trying to disable things like the page cache that are not
essential functionality.

In any case the HOL blocking issue is obvious from reading the code
which and implies bimodal behavior-- we don't need a test for that to
know it's a bad as we've see the bad effects of that in many other
contexts.
>
> My previous measurements show approx 20℅ speedup on a UDP test with delivery
> to remote CPU.
>
> Removing the cache would of cause be a good usecase for speeding up the page
> allocator (PCP). Which Mel Gorman and me are working on. AFAIK current page
> order0 cost 240 cycles. Mel have reduced til to 180, and without NUMA 150
> cycles. And with bulking this can be amortized to 80 cycles.
>
That would be great. If only I had a nickel for every time someone
started working on a driver and came the conclusion that they need to
do a custom memory allocator because the kernel allocator is so
inefficient!

Tom

> --Jesper
>