linux-kernel - Re: alloc_contig_range() with MIGRATE_MOVABLE performance regression since 4.9

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <d2bf87c0-7a2d-d663-a0ac-99840c77cd44@redhat.com>
Date:   Mon, 17 May 2021 09:46:12 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Florian Fainelli <f.fainelli@...il.com>,
        Michal Hocko <mhocko@...e.com>
Cc:     Vlastimil Babka <vbabka@...e.cz>, Mel Gorman <mgorman@...e.de>,
        Minchan Kim <minchan@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>, l.stach@...gutronix.de,
        LKML <linux-kernel@...r.kernel.org>,
        Jaewon Kim <jaewon31.kim@...sung.com>,
        Michal Nazarewicz <mina86@...a86.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Oscar Salvador <OSalvador@...e.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: alloc_contig_range() with MIGRATE_MOVABLE performance regression
 since 4.9

On 16.05.21 18:13, Florian Fainelli wrote:
> 
> 
> On 4/22/2021 12:31 PM, Florian Fainelli wrote:
>>> For
>>>
>>> https://lkml.kernel.org/r/20210121175502.274391-3-minchan@kernel.org
>>>
>>> to do its work you'll have to pass  __GFP_NORETRY to
>>> alloc_contig_range(). This requires CMA adaptions, from where we call
>>> alloc_contig_range().
>>
>> Yes, I did modify the alloc_contig_range() caller to pass GFP_KERNEL |
>> __GFP_NORETRY. I did run for a more iterations (1000) and the results
>> are not very conclusive as with __GFP_NORETRY the allocation time per
>> allocation was not significantly better, in fact it was slightly worse
>> by 100us than without.
>>
>> My x86 VM with 1GB of DRAM including 512MB being in ZONE_MOVABLE does
>> shows identical numbers for both 4.9 and 5.4 so this must be something
>> specific to ARM64 and/or the code we added to create a ZONE_MOVABLE on
>> that architecture since movablecore does not appear to have any effect
>> unlike x86.
> 
> We tracked down the slowdowns to be caused by two major contributors:
> 
> - for a reason that we do not fully understand yet the same cpufreq
> governor (conservative) did not cause alloc_contig_range() to be slowed
> down on 4.9 as much as it it with 5.4, running tests with the
> performance cpufreq governor works a tad better and the results are more
> consistent from run to run with a smaller variation.

Interesting! So your CPU is down-clocking while performing (heavy) 
kernel work? Is that expected or are we mis-accounting kernel cpu time 
somehow when it comes to determining the CPU target frequency?

> 
> - another large contributor to the slowdown was having enabled
> CONFIG_IRQSOFF_TRACER. After c3bc8fd637a9623f5c507bd18f9677effbddf584
> ("tracing: Centralize preemptirq tracepoints and unify their usage") we
> now prepare arguments for tracing even if we end-up not using them since
> tracing is not enabled at runtime. Getting the caller function's return
> address is cheap on arm64 for level == 0, but getting the preceding
> caller involves doing a backtrace walk which is expensive (see
> arch/arm64/kernel/return_address.c).

Again, very interesting finding.

> 
> So with these two variables eliminated we are only about x2 slower on
> 5.4 than we were on 4.9 and this is acceptable for our use case. I would
> not say the case is closed but at least we understand it better. We now
> have 5.10 brought up to speed so any new investigation will be focused
> on that kernel.
> 

Thanks for the insight, please do let me know when you learn more. x2 
slowdown still is quite a lot.

-- 
Thanks,

David / dhildenb