linux-kernel - Re: [patch v5] mm: lru_cache_disable: replace work queue synchronization with synchronize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <67d39f78-8eed-f49a-b3b0-18f77f9821cd@leemhuis.info>
Date:   Sun, 19 Jun 2022 14:14:03 +0200
From:   Thorsten Leemhuis <regressions@...mhuis.info>
To:     Michael Larabel <Michael@...haelLarabel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Marcelo Tosatti <mtosatti@...hat.com>
Cc:     Borislav Petkov <bp@...en8.de>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, Minchan Kim <minchan@...nel.org>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Nicolas Saenz Julienne <nsaenzju@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Stefan Wahren <stefan.wahren@...e.com>,
        "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>
Subject: Re: [patch v5] mm: lru_cache_disable: replace work queue
 synchronization with synchronize_rcu

Hi, this is your Linux kernel regression tracker.

On 29.05.22 02:48, Michael Larabel wrote:
> On 5/28/22 17:54, Michael Larabel wrote:
>> On 5/28/22 16:18, Andrew Morton wrote:
>>> On Thu, 28 Apr 2022 15:00:11 -0300 Marcelo Tosatti
>>> <mtosatti@...hat.com> wrote:
>>>> On Thu, Mar 31, 2022 at 03:52:45PM +0200, Borislav Petkov wrote:
>>>>> On Thu, Mar 10, 2022 at 10:22:12AM -0300, Marcelo Tosatti wrote:
>>>>> Someone pointed me at this:
>>>>> https://www.phoronix.com/scan.php?page=news_item&px=Linux-518-Stress-NUMA-Goes-Boom
>>>>>
>>>>> which says this one causes a performance regression with stress-ng's
>>>>> NUMA test...
>>>>
>>>> This is probably do_migrate_pages that is taking too long due to
>>>> synchronize_rcu().
>>>>
>>>> Switching to synchronize_rcu_expedited() should probably fix it...
>>>> Can you give it a try, please?
>>> I guess not.
>>>
>>> Is anyone else able to demonstrate a stress-ng performance regression
>>> due to ff042f4a9b0508?  And if so, are they able to try Marcelo's
>>> one-liner?
>>
>> Apologies I don't believe I got the email previously (or if it ended
>> up in spam or otherwise overlooked) so just noticed this thread now...
>>
>> I have the system around and will work on verifying it can reproduce
>> still and can then test the patch, should be able to get it tomorrow.
>>
>> Thanks and sorry about the delay.
> 
> Had a chance to look at it today still. I was able to reproduce the
> regression still on that 5950X system going from v5.17 to v5.18 (using
> newer stress-ng benchmark and other system changes since the prior
> tests). Confirmed it also still showed slower as of today's Git.
> 
> I can confirm with Marcelo's patch below that the stress-ng NUMA
> performance is back to the v5.17 level of performance (actually, faster)
> and certainly not like what I was seeing on v5.18 or Git to this point.
> 
> So all seems to be good with that one-liner for the stress-ng NUMA test
> case. All the system details and results for those interested is
> documented @ https://openbenchmarking.org/result/2205284-PTS-NUMAREGR17
> but basically amounts to:
> 
>     Stress-NG 0.14
>     Test: NUMA
>     Bogo Ops/s > Higher Is Better
>     v5.17: 412.88
>     v5.18: 49.33
>     20220528 Git: 49.66
>     20220528 Git + sched-rcu-exped patch: 468.81
> 
> Apologies again about the delay / not seeing the email thread earlier.
>lru_cache_disable: replace work queue synchronization with synchronize_rcu 
> Thanks,
> 
> Michael
> 
> Tested-by: Michael Larabel <Michael@...haelLarabel.com>

Andrew, is there a reason why this patch afaics isn't mainlined yet and
lingering in linux-next for so long? Michael confirmed that this patch
fixes a regression three weeks ago and a few days later Stefan confirmed
that his problem was solved as well:
https://lore.kernel.org/regressions/79bb603e-37cb-d1dd-1e12-7ce28d7cfdae@i2se.com/

Reminder: unless there are good reasons it shouldn't take this long to
for reason explained in
https://docs.kernel.org/process/handling-regressions.html

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

>>>> diff --git a/mm/swap.c b/mm/swap.c
>>>> index bceff0cb559c..04a8bbf9817a 100644
>>>> --- a/mm/swap.c
>>>> +++ b/mm/swap.c
>>>> @@ -879,7 +879,7 @@ void lru_cache_disable(void)
>>>>        * lru_disable_count = 0 will have exited the critical
>>>>        * section when synchronize_rcu() returns.
>>>>        */
>>>> -    synchronize_rcu();
>>>> +    synchronize_rcu_expedited();
>>>>   #ifdef CONFIG_SMP
>>>>       __lru_add_drain_all(true);
>>>>   #else
>>>>
>>>>