[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67d39f78-8eed-f49a-b3b0-18f77f9821cd@leemhuis.info>
Date: Sun, 19 Jun 2022 14:14:03 +0200
From: Thorsten Leemhuis <regressions@...mhuis.info>
To: Michael Larabel <Michael@...haelLarabel.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Marcelo Tosatti <mtosatti@...hat.com>
Cc: Borislav Petkov <bp@...en8.de>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, Minchan Kim <minchan@...nel.org>,
Matthew Wilcox <willy@...radead.org>,
Mel Gorman <mgorman@...hsingularity.net>,
Nicolas Saenz Julienne <nsaenzju@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
"Paul E. McKenney" <paulmck@...nel.org>,
Stefan Wahren <stefan.wahren@...e.com>,
"regressions@...ts.linux.dev" <regressions@...ts.linux.dev>
Subject: Re: [patch v5] mm: lru_cache_disable: replace work queue
synchronization with synchronize_rcu
Hi, this is your Linux kernel regression tracker.
On 29.05.22 02:48, Michael Larabel wrote:
> On 5/28/22 17:54, Michael Larabel wrote:
>> On 5/28/22 16:18, Andrew Morton wrote:
>>> On Thu, 28 Apr 2022 15:00:11 -0300 Marcelo Tosatti
>>> <mtosatti@...hat.com> wrote:
>>>> On Thu, Mar 31, 2022 at 03:52:45PM +0200, Borislav Petkov wrote:
>>>>> On Thu, Mar 10, 2022 at 10:22:12AM -0300, Marcelo Tosatti wrote:
>>>>> Someone pointed me at this:
>>>>> https://www.phoronix.com/scan.php?page=news_item&px=Linux-518-Stress-NUMA-Goes-Boom
>>>>>
>>>>> which says this one causes a performance regression with stress-ng's
>>>>> NUMA test...
>>>>
>>>> This is probably do_migrate_pages that is taking too long due to
>>>> synchronize_rcu().
>>>>
>>>> Switching to synchronize_rcu_expedited() should probably fix it...
>>>> Can you give it a try, please?
>>> I guess not.
>>>
>>> Is anyone else able to demonstrate a stress-ng performance regression
>>> due to ff042f4a9b0508? And if so, are they able to try Marcelo's
>>> one-liner?
>>
>> Apologies I don't believe I got the email previously (or if it ended
>> up in spam or otherwise overlooked) so just noticed this thread now...
>>
>> I have the system around and will work on verifying it can reproduce
>> still and can then test the patch, should be able to get it tomorrow.
>>
>> Thanks and sorry about the delay.
>
> Had a chance to look at it today still. I was able to reproduce the
> regression still on that 5950X system going from v5.17 to v5.18 (using
> newer stress-ng benchmark and other system changes since the prior
> tests). Confirmed it also still showed slower as of today's Git.
>
> I can confirm with Marcelo's patch below that the stress-ng NUMA
> performance is back to the v5.17 level of performance (actually, faster)
> and certainly not like what I was seeing on v5.18 or Git to this point.
>
> So all seems to be good with that one-liner for the stress-ng NUMA test
> case. All the system details and results for those interested is
> documented @ https://openbenchmarking.org/result/2205284-PTS-NUMAREGR17
> but basically amounts to:
>
> Stress-NG 0.14
> Test: NUMA
> Bogo Ops/s > Higher Is Better
> v5.17: 412.88
> v5.18: 49.33
> 20220528 Git: 49.66
> 20220528 Git + sched-rcu-exped patch: 468.81
>
> Apologies again about the delay / not seeing the email thread earlier.
>lru_cache_disable: replace work queue synchronization with synchronize_rcu
> Thanks,
>
> Michael
>
> Tested-by: Michael Larabel <Michael@...haelLarabel.com>
Andrew, is there a reason why this patch afaics isn't mainlined yet and
lingering in linux-next for so long? Michael confirmed that this patch
fixes a regression three weeks ago and a few days later Stefan confirmed
that his problem was solved as well:
https://lore.kernel.org/regressions/79bb603e-37cb-d1dd-1e12-7ce28d7cfdae@i2se.com/
Reminder: unless there are good reasons it shouldn't take this long to
for reason explained in
https://docs.kernel.org/process/handling-regressions.html
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.
>>>> diff --git a/mm/swap.c b/mm/swap.c
>>>> index bceff0cb559c..04a8bbf9817a 100644
>>>> --- a/mm/swap.c
>>>> +++ b/mm/swap.c
>>>> @@ -879,7 +879,7 @@ void lru_cache_disable(void)
>>>> * lru_disable_count = 0 will have exited the critical
>>>> * section when synchronize_rcu() returns.
>>>> */
>>>> - synchronize_rcu();
>>>> + synchronize_rcu_expedited();
>>>> #ifdef CONFIG_SMP
>>>> __lru_add_drain_all(true);
>>>> #else
>>>>
>>>>
Powered by blists - more mailing lists