lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240822104323.GA315708@cmpxchg.org>
Date: Thu, 22 Aug 2024 12:43:23 +0200
From: Johannes Weiner <hannes@...xchg.org>
To: Usama Arif <usamaarif642@...il.com>
Cc: akpm@...ux-foundation.org, riel@...riel.com, zhaoyang.huang@...soc.com,
	yuzhao@...gle.com, david@...hat.com, leitao@...ian.org,
	huangzhaoyang@...il.com, bharata@....com, willy@...radead.org,
	vbabka@...e.cz, linux-kernel@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH] Revert "mm: skip CMA pages when they are not available"

On Wed, Aug 21, 2024 at 03:53:21PM -0400, Usama Arif wrote:
> From 1aae7f04a5cb203ea2c3ede7973dd9eddbbd7a8b Mon Sep 17 00:00:00 2001
> From: Usama Arif <usamaarif642@...il.com>
> Date: Wed, 21 Aug 2024 20:26:07 +0100
> Subject: [PATCH] Revert "mm: skip CMA pages when they are not available"
> 
> This reverts commit 5da226dbfce3a2f44978c2c7cf88166e69a6788b.
> 
> lruvec->lru_lock is highly contended and is held when calling
> isolate_lru_folios. If the lru has a large number of CMA folios
> consecutively, while the allocation type requested is not
> MIGRATE_MOVABLE, isolate_lru_folios can hold the lock for a very long
> time while it skips those. For FIO workload, ~150million order=0
> folios were skipped to isolate a few ZONE_DMA folios [1].
> This can cause lockups [1] and high memory pressure for extended periods
> of time [2].
> 
> [1] https://lore.kernel.org/all/CAOUHufbkhMZYz20aM_3rHZ3OcK4m2puji2FGpUpn_-DevGk3Kg@mail.gmail.com/
> [2] https://lore.kernel.org/all/ZrssOrcJIDy8hacI@gmail.com/
> 
> Signed-off-by: Usama Arif <usamaarif642@...il.com>

Acked-by: Johannes Weiner <hannes@...xchg.org>

I think this is the right move for now, until there is a robust
solution for the original issue.

But hould b7108d66318abf3e060c7839eabcba52e9461568 be reverted along
with it? From its changelog:

    No observable issue without this patch on MGLRU, but logically it make
    sense to skip the CMA page reclaim when those pages can't be satisfied for
    the current allocation context.

Presumably it has the same risk reward profile as it does on
conventional reclaim, with long skip runs while holding the
lruvec->lock.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ