linux-kernel - Re: alloc_contig_range() with MIGRATE_MOVABLE performance regression since 4.9

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <58726a6b-5468-a6b4-7c26-371ef5d71ee2@gmail.com>
Date:   Thu, 22 Apr 2021 10:50:16 -0700
From:   Florian Fainelli <f.fainelli@...il.com>
To:     David Hildenbrand <david@...hat.com>,
        Michal Hocko <mhocko@...e.com>
Cc:     Vlastimil Babka <vbabka@...e.cz>, Mel Gorman <mgorman@...e.de>,
        Minchan Kim <minchan@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>, l.stach@...gutronix.de,
        LKML <linux-kernel@...r.kernel.org>,
        Jaewon Kim <jaewon31.kim@...sung.com>,
        Michal Nazarewicz <mina86@...a86.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Oscar Salvador <OSalvador@...e.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: alloc_contig_range() with MIGRATE_MOVABLE performance regression
 since 4.9



On 4/22/2021 1:56 AM, David Hildenbrand wrote:
> On 22.04.21 09:49, Michal Hocko wrote:
>> Cc David and Oscar who are familiar with this code as well.
>>
>> On Wed 21-04-21 11:36:01, Florian Fainelli wrote:
>>> Hi all,
>>>
>>> I have been trying for the past few days to identify the source of a
>>> performance regression that we are seeing with the 5.4 kernel but not
>>> with the 4.9 kernel on ARM64. Testing something newer like 5.10 is a bit
>>> challenging at the moment but will happen eventually.
>>>
>>> What we are seeing is a ~3x increase in the time needed for
>>> alloc_contig_range() to allocate 1GB in blocks of 2MB pages. The system
>>> is idle at the time and there are no other contenders for memory other
>>> than the user-space programs already started (DHCP client, shell, etc.).
> 
> Hi,
> 
> If you can easily reproduce it might be worth to just try bisecting;
> that could be faster than manually poking around in the code.
> 
> Also, it would be worth having a look at the state of upstream Linux.
> Upstream Linux developers tend to not care about minor performance
> regressions on oldish kernels.

This is a big pain point here and I cannot agree more, but until we
bridge that gap, this is not exactly easy to do for me unfortunately and
neither is bisection :/

> 
> There has been work on improving exactly the situation you are
> describing -- a "fail fast" / "no retry" mode for alloc_contig_range().
> Maybe it tackles exactly this issue.
> 
> https://lkml.kernel.org/r/20210121175502.274391-3-minchan@kernel.org
> 
> Minchan is already on cc.

This patch does not appear to be helping, in fact, I had locally applied
this patch from way back when:

https://lkml.org/lkml/2014/5/28/113

which would effectively do this unconditionally. Let me see if I can
showcase this problem a x86 virtual machine operating in similar
conditions to ours.

> 
> (next time, please cc linux-mm on core-mm questions; maybe you tried,
> but ended up with linux-mmc :) )

Yes that was the intent, thanks for correcting that.

> 
>>>
>>> I have tried playing with the compact_control structure settings but
>>> have not found anything that would bring us back to the performance of
>>> 4.9. More often than not, we see test_pages_isolated() returning an
>>> non-zero error code which would explain the slow down, since we have
>>> some logic that re-tries the allocation if alloc_contig_range() returns
>>> -EBUSY. If I remove the retry logic however, we don't get -EBUSY and we
>>> get the results below:
>>>
>>> 4.9 shows this:
>>>
>>> [  457.537634] allocating: size: 1024MB avg: 59172 (us), max: 137306
>>> (us), min: 44859 (us), total: 591723 (us), pages: 512, per-page: 115
>>> (us)
>>> [  457.550222] freeing: size: 1024MB avg: 67397 (us), max: 151408 (us),
>>> min: 52630 (us), total: 673974 (us), pages: 512, per-page: 131 (us)
>>>
>>> 5.4 show this:
>>>
>>> [  222.388758] allocating: size: 1024MB avg: 156739 (us), max: 157254
>>> (us), min: 155915 (us), total: 1567394 (us), pages: 512, per-page:
>>> 306 (us)
>>> [  222.401601] freeing: size: 1024MB avg: 209899 (us), max: 210085 (us),
>>> min: 209749 (us), total: 2098999 (us), pages: 512, per-page: 409 (us)
>>>
>>> This regression is not seen when MIGRATE_CMA is specified instead of
>>> MIGRATE_MOVABLE.
>>>
>>> A few characteristics that you should probably be aware of:
>>>
>>> - There is 4GB of memory populated with the memory being mapped into the
>>> CPU's address starting at space at 0x4000_0000 (1GB), PAGE_SIZE is 4KB
>>>
>>> - there is a ZONE_DMA32 that starts at 0x4000_0000 and ends at
>>> 0xE480_0000, from there on we have a ZONE_MOVABLE which is comprised of
>>> 0xE480_0000 - 0xfdc00000 and another range spanning 0x1_0000_0000 -
>>> 0x1_4000_0000
>>>
>>> Attached is the kernel configuration.
>>>
>>> Thanks!
>>> -- 
>>> Florian
>>
>>
>>
> 
> 

-- 
Florian