lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7fc532c3-8780-4484-6932-6bfabecda189@oracle.com>
Date:   Tue, 3 Nov 2020 13:37:45 -0800
From:   Dongli Zhang <dongli.zhang@...cle.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     linux-mm@...ck.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
        davem@...emloft.net, kuba@...nel.org, aruna.ramakrishna@...cle.com,
        bert.barbe@...cle.com, rama.nichanamatlu@...cle.com,
        venkat.x.venkatsubra@...cle.com, manjunath.b.patil@...cle.com,
        joe.jin@...cle.com, srinivas.eeda@...cle.com
Subject: Re: [PATCH 1/1] mm: avoid re-using pfmemalloc page in
 page_frag_alloc()



On 11/3/20 1:15 PM, Matthew Wilcox wrote:
> On Tue, Nov 03, 2020 at 12:57:33PM -0800, Dongli Zhang wrote:
>> On 11/3/20 12:35 PM, Matthew Wilcox wrote:
>>> On Tue, Nov 03, 2020 at 11:32:39AM -0800, Dongli Zhang wrote:
>>>> However, once kernel is not under memory pressure any longer (suppose large
>>>> amount of memory pages are just reclaimed), the page_frag_alloc() may still
>>>> re-use the prior pfmemalloc page_frag_cache->va to allocate skb->data. As a
>>>> result, the skb->pfmemalloc is always true unless page_frag_cache->va is
>>>> re-allocated, even the kernel is not under memory pressure any longer.
>>>> +	/*
>>>> +	 * Try to avoid re-using pfmemalloc page because kernel may already
>>>> +	 * run out of the memory pressure situation at any time.
>>>> +	 */
>>>> +	if (unlikely(nc->va && nc->pfmemalloc)) {
>>>> +		page = virt_to_page(nc->va);
>>>> +		__page_frag_cache_drain(page, nc->pagecnt_bias);
>>>> +		nc->va = NULL;
>>>> +	}
>>>
>>> I think this is the wrong way to solve this problem.  Instead, we should
>>> use up this page, but refuse to recycle it.  How about something like this (not even compile tested):
>>
>> Thank you very much for the feedback. Yes, the option is to use the same page
>> until it is used up (offset < 0). Instead of recycling it, the kernel free it
>> and allocate new one.
>>
>> This depends on whether we will tolerate the packet drop until this page is used up.
>>
>> For virtio-net, the payload (skb->data) is of size 128-byte. The padding and
>> alignment will finally make it as 512-byte.
>>
>> Therefore, for virtio-net, we will have at most 4096/512-1=7 packets dropped
>> before the page is used up.
> 
> My thinking is that if the kernel is under memory pressure then freeing
> the page and allocating a new one is likely to put even more strain
> on the memory allocator, so we want to do this "soon", rather than at
> each allocation.
> 
> Thanks for providing the numbers.  Do you think that dropping (up to)
> 7 packets is acceptable?>
> We could also do something like ...
> 
>         if (unlikely(nc->pfmemalloc)) {
>                 page = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
>                 if (page)
>                         nc->pfmemalloc = 0;
>                 put_page(page);
>         }
> 
> to test if the memory allocator has free pages at the moment.  Not sure
> whether that's a good idea or not -- hopefully you have a test environment
> set up where you can reproduce this condition on demand and determine
> which of these three approaches is best!
> 


>From mm's perspective, we expect to reduce the number of page allocation
(especially under memory pressure).

>From networking's perspective, we expect to reduce the number of skb drop.

That's why I CCed netdev folks (including David and Jakub), although the patch
is for mm/page_alloc.c. The page_frag_alloc() is primarily used by networking
and nvme-tcp.


Unfortunately, so far I do not have the env to reproduce. I reproduced with a
patch to fail page allocation and set nc->pfmemalloc on purpose.

>From mm's perspective, I think to use up the page is a good option. Indeed tt is
system administrator's duty to avoid memory pressure, in order to avoid the
extra packet drops.

Dongli Zhang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ