linux-kernel - Re: [PATCH v2 0/1] mm/gup: avoid an unnecessary allocation call for FOLL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f747223e-042f-40f4-841c-1c8019dc8510@redhat.com>
Date: Tue, 5 Nov 2024 09:42:14 +0100
From: David Hildenbrand <david@...hat.com>
To: John Hubbard <jhubbard@...dia.com>,
 Andrew Morton <akpm@...ux-foundation.org>
Cc: LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
 Vivek Kasireddy <vivek.kasireddy@...el.com>, Dave Airlie
 <airlied@...hat.com>, Gerd Hoffmann <kraxel@...hat.com>,
 Matthew Wilcox <willy@...radead.org>, Christoph Hellwig <hch@...radead.org>,
 Jason Gunthorpe <jgg@...dia.com>, Peter Xu <peterx@...hat.com>,
 Arnd Bergmann <arnd@...db.de>, Daniel Vetter <daniel.vetter@...ll.ch>,
 Dongwon Kim <dongwon.kim@...el.com>, Hugh Dickins <hughd@...gle.com>,
 Junxiao Chang <junxiao.chang@...el.com>,
 Mike Kravetz <mike.kravetz@...cle.com>, Oscar Salvador <osalvador@...e.de>,
 linux-stable@...r.kernel.org
Subject: Re: [PATCH v2 0/1] mm/gup: avoid an unnecessary allocation call for
 FOLL_LONGTERM cases

On 05.11.24 04:29, John Hubbard wrote:
> This applies to today's mm-hotfixes-unstable (only). In order to test
> this, my earlier patch is a prequisite: commit 255231c75dcd ("mm/gup:
> stop leaking pinned pages in low memory conditions").
> 
> Changes since v1 [1]:
> 
> 1) Completely different implementation: instead of changing the
> allocator from kmalloc() to kvmalloc(), just avoid allocations entirely.
> 
> Note that David's original suggestion [2] included something that I've
> left out for now, mostly because it's a pre-existing question and
> deserves its own patch. But also, I don't understand it yet, either.

Yeah, I was only adding it because I stumbled over it. It might not be a 
problem, because we simply "skip" if we find a folio that was already 
isolated (possibly by us). What might happen is that we unnecessarily 
drain the LRU.

__collapse_huge_page_isolate() scans the compound_pagelist() list, 
before try-locking and isolating. But it also just "fails" instead of 
retrying forever.

Imagine the page tables looking like the following (e.g., COW in a 
MAP_PRIVATE file mapping that supports large folios)

		      ------ F0P2 was replaced by a new (small) folio
		     |
[ F0P0 ] [ F0P1 ] [ F1P0 ] [F0P3 ]

F0P0: Folio 0, page 0

Assume we try pinning that range and end up in 
collect_longterm_unpinnable_folios() with:

F0, F0, F1, F0


Assume F0 and F1 are not long-term pinnable.

i = 0: We isolate F0
i = 1: We see that it is the same F0 and skip
i = 2: We isolate F1
i = 3: We see !folio_test_lru() and do a lru_add_drain_all() to then
        fail folio_isolate_lru()

So the drain in i=3 could be avoided by scanning the list, if we already 
isolated that one. Working better than I originally thought.

-- 
Cheers,

David / dhildenb