[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d2bf87c0-7a2d-d663-a0ac-99840c77cd44@redhat.com>
Date: Mon, 17 May 2021 09:46:12 +0200
From: David Hildenbrand <david@...hat.com>
To: Florian Fainelli <f.fainelli@...il.com>,
Michal Hocko <mhocko@...e.com>
Cc: Vlastimil Babka <vbabka@...e.cz>, Mel Gorman <mgorman@...e.de>,
Minchan Kim <minchan@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, l.stach@...gutronix.de,
LKML <linux-kernel@...r.kernel.org>,
Jaewon Kim <jaewon31.kim@...sung.com>,
Michal Nazarewicz <mina86@...a86.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Oscar Salvador <OSalvador@...e.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: alloc_contig_range() with MIGRATE_MOVABLE performance regression
since 4.9
On 16.05.21 18:13, Florian Fainelli wrote:
>
>
> On 4/22/2021 12:31 PM, Florian Fainelli wrote:
>>> For
>>>
>>> https://lkml.kernel.org/r/20210121175502.274391-3-minchan@kernel.org
>>>
>>> to do its work you'll have to passĀ __GFP_NORETRY to
>>> alloc_contig_range(). This requires CMA adaptions, from where we call
>>> alloc_contig_range().
>>
>> Yes, I did modify the alloc_contig_range() caller to pass GFP_KERNEL |
>> __GFP_NORETRY. I did run for a more iterations (1000) and the results
>> are not very conclusive as with __GFP_NORETRY the allocation time per
>> allocation was not significantly better, in fact it was slightly worse
>> by 100us than without.
>>
>> My x86 VM with 1GB of DRAM including 512MB being in ZONE_MOVABLE does
>> shows identical numbers for both 4.9 and 5.4 so this must be something
>> specific to ARM64 and/or the code we added to create a ZONE_MOVABLE on
>> that architecture since movablecore does not appear to have any effect
>> unlike x86.
>
> We tracked down the slowdowns to be caused by two major contributors:
>
> - for a reason that we do not fully understand yet the same cpufreq
> governor (conservative) did not cause alloc_contig_range() to be slowed
> down on 4.9 as much as it it with 5.4, running tests with the
> performance cpufreq governor works a tad better and the results are more
> consistent from run to run with a smaller variation.
Interesting! So your CPU is down-clocking while performing (heavy)
kernel work? Is that expected or are we mis-accounting kernel cpu time
somehow when it comes to determining the CPU target frequency?
>
> - another large contributor to the slowdown was having enabled
> CONFIG_IRQSOFF_TRACER. After c3bc8fd637a9623f5c507bd18f9677effbddf584
> ("tracing: Centralize preemptirq tracepoints and unify their usage") we
> now prepare arguments for tracing even if we end-up not using them since
> tracing is not enabled at runtime. Getting the caller function's return
> address is cheap on arm64 for level == 0, but getting the preceding
> caller involves doing a backtrace walk which is expensive (see
> arch/arm64/kernel/return_address.c).
Again, very interesting finding.
>
> So with these two variables eliminated we are only about x2 slower on
> 5.4 than we were on 4.9 and this is acceptable for our use case. I would
> not say the case is closed but at least we understand it better. We now
> have 5.10 brought up to speed so any new investigation will be focused
> on that kernel.
>
Thanks for the insight, please do let me know when you learn more. x2
slowdown still is quite a lot.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists