lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 31 Dec 2021 15:24:11 +0100
From:   Thorsten Leemhuis <regressions@...mhuis.info>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     Mel Gorman <mgorman@...hsingularity.net>,
        Mark Brown <broonie@...nel.org>,
        Michal Hocko <mhocko@...e.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Alexey Avramov <hakavlad@...ox.lv>,
        Rik van Riel <riel@...riel.com>,
        Mike Galbraith <efault@....de>,
        Darrick Wong <djwong@...nel.org>, regressions@...ts.linux.dev,
        Linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH v4 1/1] mm: vmscan: Reduce throttling due to a failure to
 make progress

On 30.12.21 00:45, Andrew Morton wrote:
> On Tue, 28 Dec 2021 11:04:18 +0100 Thorsten Leemhuis <regressions@...mhuis.info> wrote:
> 
>> Hi, this is your Linux kernel regression tracker speaking.
>>
>> On 02.12.21 16:06, Mel Gorman wrote:
>>> Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar
>>> problems due to reclaim throttling for excessive lengths of time.
>>> In Alexey's case, a memory hog that should go OOM quickly stalls for
>>> several minutes before stalling. In Mike and Darrick's cases, a small
>>> memcg environment stalled excessively even though the system had enough
>>> memory overall.
>>
>> Just wondering: this patch afaics is now in -mm and  Linux next for
>> nearly two weeks. Is that intentional? I had expected it to be mainlined
>> with the batch of patches Andrew mailed to Linus last week, but it
>> wasn't among them.
> 
> I have it queued for 5.17-rc1.
> 
> There is still time to squeeze it into 5.16, just, with a cc:stable. 
> 
> Alternatively we could merge it into 5.17-rc1 with a cc:stable, so it
> will trickle back with less risk to the 5.17 release.
> 
> What do people think?

CCing Linus, to make sure he's aware of this.

Maybe I'm totally missing something, but I'm a bit confused by what you
wrote, as the regression afaik was introduced between v5.15..v5.16-rc1.
So I assume this is what you meant:

```
I have it queued for 5.17-rc1.

There is still time to squeeze it into 5.16.

Alternatively we could merge it into 5.17-rc1 with a cc:stable, so it
will trickle back with less risk to the 5.16 release.

What do people think?
```

I'll leave the individual risk evaluation of the patch to others. If the
fix is risky, waiting for 5.17 is fine for me.

But hmmm, regarding the "could merge it into 5.17-rc1 with a cc:stable"
idea a remark: is that really "less risk", as your stated?

If we get it into rc8 (which is still possible, even if a bit hard due
to the new year festivities), it will get at least one week of testing.

If the fix waits for the next merge window, it all depends on the how
the timing works out. But it's easy to picture a worst case: the fix is
only merged on the Friday evening before Linus releases 5.17-rc1 and
right after it's out makes it into a stable-rc (say a day or two after
5.17-rc1 is out) and from there into a 5.16.y release on Thursday. That
IMHO would mean less days of testing in the end (and there is a weekend
in this period as well).

Waiting obviously will also mean that users of 5.16 and 5.16.y will
likely have to face this regression for at least two and a half weeks,
unless you send the fix early and Greg backports it before rc1 (which he
afaics does if there are good reasons). Yes, it's `just` a performance
regression, so it might not stop anyone from running Linux 5.16 -- but
it's one that three people separately reported in the 5.16 devel cycle,
so others will likely encounter it as well if we leave it unfixed in
5.16. This will likely annoy some people, especially if they invest time
in bisecting it, only to find out that the forth iteration of the fix
for the regression is already available since December the 2nd.

Ciao, Thorsten

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ