lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1d490ab5-5cf8-4c16-65d0-37a62999fcd5@google.com>
Date: Thu, 5 Sep 2024 01:46:19 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: Andrew Morton <akpm@...ux-foundation.org>, 
    Usama Arif <usamaarif642@...il.com>, Yu Zhao <yuzhao@...gle.com>
cc: linux-mm@...ck.org, hannes@...xchg.org, riel@...riel.com, 
    shakeel.butt@...ux.dev, roman.gushchin@...ux.dev, david@...hat.com, 
    npache@...hat.com, baohua@...nel.org, ryan.roberts@....com, 
    rppt@...nel.org, willy@...radead.org, cerasuolodomenico@...il.com, 
    ryncsn@...il.com, corbet@....net, linux-kernel@...r.kernel.org, 
    linux-doc@...r.kernel.org, kernel-team@...a.com, 
    Shuang Zhai <zhais@...gle.com>
Subject: Re: [PATCH v5 1/6] mm: free zapped tail pages when splitting isolated
 thp

On Fri, 30 Aug 2024, Usama Arif wrote:

> From: Yu Zhao <yuzhao@...gle.com>
> 
> If a tail page has only two references left, one inherited from the
> isolation of its head and the other from lru_add_page_tail() which we
> are about to drop, it means this tail page was concurrently zapped.
> Then we can safely free it and save page reclaim or migration the
> trouble of trying it.
> 
> Signed-off-by: Yu Zhao <yuzhao@...gle.com>
> Tested-by: Shuang Zhai <zhais@...gle.com>
> Acked-by: Johannes Weiner <hannes@...xchg.org>
> Signed-off-by: Usama Arif <usamaarif642@...il.com>

I'm sorry, but I think this patch (just this 1/6) needs to be dropped:
it is only an optimization, and unless a persuasive performance case
can be made to extend it, it ought to go (perhaps revisited later).

The problem I kept hitting was that all my work, requiring compaction and
reclaim, got (killably) stuck in or repeatedly calling reclaim_throttle():
because nr_isolated_anon had grown high - and remained high even when the
load had all been killed.

Bisection led to the 2/6 (remap to shared zeropage), but I'd say this 1/6
is the one to blame. I was intending to send this patch to "fix" it:

--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3295,6 +3295,8 @@ static void __split_huge_page(struct pag
 			folio_clear_active(new_folio);
 			folio_clear_unevictable(new_folio);
 			list_del(&new_folio->lru);
+			node_stat_sub_folio(folio, NR_ISOLATED_ANON +
+						folio_is_file_lru(folio));
 			if (!folio_batch_add(&free_folios, new_folio)) {
 				mem_cgroup_uncharge_folios(&free_folios);
 				free_unref_folios(&free_folios);

And that ran nicely, until I terminated the run and did
grep nr_isolated /proc/sys/vm/stat_refresh /proc/vmstat
at the end: stat_refresh kindly left a pr_warn in dmesg to say
nr_isolated_anon -334013737

My patch is not good enough. IIUC, some split_huge_pagers (reclaim?)
know how many pages they isolated and decremented the stats by, and
increment by that same number at the end; whereas other split_huge_pagers
(migration?) decrement one by one as they go through the list afterwards.

I've run out of time (I'm about to take a break): I gave up researching
who needs what, and was already feeling this optimization does too much
second guessing of what's needed (and its array of VM_WARN_ON_ONCE_FOLIOs
rather admits to that).

And I don't think it's as simple as moving the node_stat_sub_folio()
into 2/6 where the zero pte is substituted: that would probably handle
the vast majority of cases, but aren't there others which pass the
folio_ref_freeze(new_folio, 2) test - the title's zapped tail pages,
or racily truncated now that the folio has been unlocked, for example?

Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ