[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250919083006.21171-1-roypat@amazon.co.uk>
Date: Fri, 19 Sep 2025 08:30:08 +0000
From: "Roy, Patrick" <roypat@...zon.co.uk>
To: "hughd@...gle.com" <hughd@...gle.com>
CC: "Liam.Howlett@...cle.com" <Liam.Howlett@...cle.com>,
"agordeev@...ux.ibm.com" <agordeev@...ux.ibm.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "alex@...ti.fr"
<alex@...ti.fr>, "andrii@...nel.org" <andrii@...nel.org>, "anna@...nel.org"
<anna@...nel.org>, "aou@...s.berkeley.edu" <aou@...s.berkeley.edu>,
"ast@...nel.org" <ast@...nel.org>, "axelrasmussen@...gle.com"
<axelrasmussen@...gle.com>, "borntraeger@...ux.ibm.com"
<borntraeger@...ux.ibm.com>, "bp@...en8.de" <bp@...en8.de>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>, "brauner@...nel.org"
<brauner@...nel.org>, "catalin.marinas@....com" <catalin.marinas@....com>,
"chenhuacai@...nel.org" <chenhuacai@...nel.org>, "corbet@....net"
<corbet@....net>, "daniel@...earbox.net" <daniel@...earbox.net>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"david@...hat.com" <david@...hat.com>, "derekmn@...zon.co.uk"
<derekmn@...zon.co.uk>, "devel@...ts.orangefs.org"
<devel@...ts.orangefs.org>, "eddyz87@...il.com" <eddyz87@...il.com>,
"gerald.schaefer@...ux.ibm.com" <gerald.schaefer@...ux.ibm.com>,
"gor@...ux.ibm.com" <gor@...ux.ibm.com>, "hannes@...xchg.org"
<hannes@...xchg.org>, "haoluo@...gle.com" <haoluo@...gle.com>,
"hca@...ux.ibm.com" <hca@...ux.ibm.com>, "hpa@...or.com" <hpa@...or.com>,
"hubcap@...ibond.com" <hubcap@...ibond.com>, "jack@...e.cz" <jack@...e.cz>,
"Thomson, Jack" <jackabt@...zon.co.uk>, "jannh@...gle.com"
<jannh@...gle.com>, "jgg@...pe.ca" <jgg@...pe.ca>, "jhubbard@...dia.com"
<jhubbard@...dia.com>, "joey.gouly@....com" <joey.gouly@....com>,
"john.fastabend@...il.com" <john.fastabend@...il.com>, "jolsa@...nel.org"
<jolsa@...nel.org>, "Kalyazin, Nikita" <kalyazin@...zon.co.uk>,
"kernel@...0n.name" <kernel@...0n.name>, "kpsingh@...nel.org"
<kpsingh@...nel.org>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"kvmarm@...ts.linux.dev" <kvmarm@...ts.linux.dev>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>, "linux-doc@...r.kernel.org"
<linux-doc@...r.kernel.org>, "linux-fsdevel@...r.kernel.org"
<linux-fsdevel@...r.kernel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-kselftest@...r.kernel.org"
<linux-kselftest@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
"linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>,
"linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
"loongarch@...ts.linux.dev" <loongarch@...ts.linux.dev>,
"lorenzo.stoakes@...cle.com" <lorenzo.stoakes@...cle.com>, "luto@...nel.org"
<luto@...nel.org>, "martin.lau@...ux.dev" <martin.lau@...ux.dev>,
"martin@...ibond.com" <martin@...ibond.com>, "maz@...nel.org"
<maz@...nel.org>, "mhocko@...e.com" <mhocko@...e.com>, "mingo@...hat.com"
<mingo@...hat.com>, "oliver.upton@...ux.dev" <oliver.upton@...ux.dev>,
"palmer@...belt.com" <palmer@...belt.com>, "paul.walmsley@...ive.com"
<paul.walmsley@...ive.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
"peterx@...hat.com" <peterx@...hat.com>, "peterz@...radead.org"
<peterz@...radead.org>, "pfalcato@...e.de" <pfalcato@...e.de>,
"quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "Roy, Patrick"
<roypat@...zon.co.uk>, "rppt@...nel.org" <rppt@...nel.org>, "sdf@...ichev.me"
<sdf@...ichev.me>, "seanjc@...gle.com" <seanjc@...gle.com>,
"shakeel.butt@...ux.dev" <shakeel.butt@...ux.dev>, "shuah@...nel.org"
<shuah@...nel.org>, "song@...nel.org" <song@...nel.org>, "surenb@...gle.com"
<surenb@...gle.com>, "suzuki.poulose@....com" <suzuki.poulose@....com>,
"svens@...ux.ibm.com" <svens@...ux.ibm.com>, "tglx@...utronix.de"
<tglx@...utronix.de>, "trondmy@...nel.org" <trondmy@...nel.org>,
"vbabka@...e.cz" <vbabka@...e.cz>, "viro@...iv.linux.org.uk"
<viro@...iv.linux.org.uk>, "weixugc@...gle.com" <weixugc@...gle.com>,
"will@...nel.org" <will@...nel.org>, "willy@...radead.org"
<willy@...radead.org>, "x86@...nel.org" <x86@...nel.org>, "Cali, Marco"
<xmarcalx@...zon.co.uk>, "yonghong.song@...ux.dev" <yonghong.song@...ux.dev>,
"yuanchu@...gle.com" <yuanchu@...gle.com>, "yuzenghui@...wei.com"
<yuzenghui@...wei.com>, "zhengqi.arch@...edance.com"
<zhengqi.arch@...edance.com>
Subject: Re: [PATCH v6 01/11] filemap: Pass address_space mapping to
->free_folio()
Hi Hugh!
On Tue, 2025-09-16 at 07:23 +0100, Hugh Dickins wrote:> On Fri, 12 Sep 2025, Roy, Patrick wrote:
>
>> From: Elliot Berman <quic_eberman@...cinc.com>
>>
>> When guest_memfd removes memory from the host kernel's direct map,
>> direct map entries must be restored before the memory is freed again. To
>> do so, ->free_folio() needs to know whether a gmem folio was direct map
>> removed in the first place though. While possible to keep track of this
>> information on each individual folio (e.g. via page flags), direct map
>> removal is an all-or-nothing property of the entire guest_memfd, so it
>> is less error prone to just check the flag stored in the gmem inode's
>> private data. However, by the time ->free_folio() is called,
>> folio->mapping might be cleared. To still allow access to the address
>> space from which the folio was just removed, pass it in as an additional
>> argument to ->free_folio, as the mapping is well-known to all callers.
>>
>> Link: https://lore.kernel.org/all/15f665b4-2d33-41ca-ac50-fafe24ade32f@redhat.com/
>> Suggested-by: David Hildenbrand <david@...hat.com>
>> Acked-by: David Hildenbrand <david@...hat.com>
>> Signed-off-by: Elliot Berman <quic_eberman@...cinc.com>
>> [patrick: rewrite shortlog for new usecase]
>> Signed-off-by: Patrick Roy <roypat@...zon.co.uk>
>> ---
>> Documentation/filesystems/locking.rst | 2 +-
>> fs/nfs/dir.c | 11 ++++++-----
>> fs/orangefs/inode.c | 3 ++-
>> include/linux/fs.h | 2 +-
>> mm/filemap.c | 9 +++++----
>> mm/secretmem.c | 3 ++-
>> mm/vmscan.c | 4 ++--
>> virt/kvm/guest_memfd.c | 3 ++-
>> 8 files changed, 21 insertions(+), 16 deletions(-)
>>
>> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
>> index aa287ccdac2f..74c97287ec40 100644
>> --- a/Documentation/filesystems/locking.rst
>> +++ b/Documentation/filesystems/locking.rst
>> @@ -262,7 +262,7 @@ prototypes::
>> sector_t (*bmap)(struct address_space *, sector_t);
>> void (*invalidate_folio) (struct folio *, size_t start, size_t len);
>> bool (*release_folio)(struct folio *, gfp_t);
>> - void (*free_folio)(struct folio *);
>> + void (*free_folio)(struct address_space *, struct folio *);
>> int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
>> int (*migrate_folio)(struct address_space *, struct folio *dst,
>> struct folio *src, enum migrate_mode);
>
> Beware, that is against the intent of free_folio().
>
> Since its 2.6.37 origin in 6072d13c4293 ("Call the filesystem back
> whenever a page is removed from the page cache"), freepage() or
> free_folio() has intentionally NOT taken a struct address_space *mapping,
> because that structure may already be freed by the time free_folio() is
> called, if the last folio holding it has now been freed.
>
> Maybe something has changed since then, or maybe it happens to be safe
> just in the context in which you want to use it; but it is against the
> principle of free_folio(). (Maybe an rcu_read_lock() could be added
> in __remove_mapping() to make it safe nowadays? maybe not welcome.)
>
> See Documentation/filesystems/vfs.rst:
> free_folio is called once the folio is no longer visible in the
> page cache in order to allow the cleanup of any private data.
> Since it may be called by the memory reclaimer, it should not
> assume that the original address_space mapping still exists, and
> it should not block.
>
> Hugh
Thanks for pointing this out! I think I can make do without this patch,
by storing the direct map state in some bit directly on the folio (in
yesterday's upstream guest_memfd call, we talked about using
->private, which guest_memfd isn't using for anything yet). Will do that
for the next iteration.
Best,
Patrick
Powered by blists - more mailing lists