lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e30627e5-f30f-4494-934c-58e4a427a476@redhat.com>
Date: Wed, 17 Sep 2025 16:52:40 +0200
From: David Hildenbrand <david@...hat.com>
To: Hugh Dickins <hughd@...gle.com>, "Roy, Patrick" <roypat@...zon.co.uk>
Cc: "Thomson, Jack" <jackabt@...zon.co.uk>,
 "Kalyazin, Nikita" <kalyazin@...zon.co.uk>,
 "Cali, Marco" <xmarcalx@...zon.co.uk>,
 "derekmn@...zon.co.uk" <derekmn@...zon.co.uk>,
 Elliot Berman <quic_eberman@...cinc.com>,
 "willy@...radead.org" <willy@...radead.org>, "corbet@....net"
 <corbet@....net>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
 "maz@...nel.org" <maz@...nel.org>,
 "oliver.upton@...ux.dev" <oliver.upton@...ux.dev>,
 "joey.gouly@....com" <joey.gouly@....com>,
 "suzuki.poulose@....com" <suzuki.poulose@....com>,
 "yuzenghui@...wei.com" <yuzenghui@...wei.com>,
 "catalin.marinas@....com" <catalin.marinas@....com>,
 "will@...nel.org" <will@...nel.org>,
 "chenhuacai@...nel.org" <chenhuacai@...nel.org>,
 "kernel@...0n.name" <kernel@...0n.name>,
 "paul.walmsley@...ive.com" <paul.walmsley@...ive.com>,
 "palmer@...belt.com" <palmer@...belt.com>,
 "aou@...s.berkeley.edu" <aou@...s.berkeley.edu>,
 "alex@...ti.fr" <alex@...ti.fr>,
 "agordeev@...ux.ibm.com" <agordeev@...ux.ibm.com>,
 "gerald.schaefer@...ux.ibm.com" <gerald.schaefer@...ux.ibm.com>,
 "hca@...ux.ibm.com" <hca@...ux.ibm.com>,
 "gor@...ux.ibm.com" <gor@...ux.ibm.com>,
 "borntraeger@...ux.ibm.com" <borntraeger@...ux.ibm.com>,
 "svens@...ux.ibm.com" <svens@...ux.ibm.com>,
 "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
 "luto@...nel.org" <luto@...nel.org>,
 "peterz@...radead.org" <peterz@...radead.org>,
 "tglx@...utronix.de" <tglx@...utronix.de>,
 "mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
 "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
 "trondmy@...nel.org" <trondmy@...nel.org>, "anna@...nel.org"
 <anna@...nel.org>, "hubcap@...ibond.com" <hubcap@...ibond.com>,
 "martin@...ibond.com" <martin@...ibond.com>,
 "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
 "brauner@...nel.org" <brauner@...nel.org>, "jack@...e.cz" <jack@...e.cz>,
 "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
 "lorenzo.stoakes@...cle.com" <lorenzo.stoakes@...cle.com>,
 "Liam.Howlett@...cle.com" <Liam.Howlett@...cle.com>,
 "vbabka@...e.cz" <vbabka@...e.cz>, "rppt@...nel.org" <rppt@...nel.org>,
 "surenb@...gle.com" <surenb@...gle.com>, "mhocko@...e.com"
 <mhocko@...e.com>, "ast@...nel.org" <ast@...nel.org>,
 "daniel@...earbox.net" <daniel@...earbox.net>,
 "andrii@...nel.org" <andrii@...nel.org>,
 "martin.lau@...ux.dev" <martin.lau@...ux.dev>,
 "eddyz87@...il.com" <eddyz87@...il.com>, "song@...nel.org"
 <song@...nel.org>, "yonghong.song@...ux.dev" <yonghong.song@...ux.dev>,
 "john.fastabend@...il.com" <john.fastabend@...il.com>,
 "kpsingh@...nel.org" <kpsingh@...nel.org>, "sdf@...ichev.me"
 <sdf@...ichev.me>, "haoluo@...gle.com" <haoluo@...gle.com>,
 "jolsa@...nel.org" <jolsa@...nel.org>, "jgg@...pe.ca" <jgg@...pe.ca>,
 "jhubbard@...dia.com" <jhubbard@...dia.com>,
 "peterx@...hat.com" <peterx@...hat.com>, "jannh@...gle.com"
 <jannh@...gle.com>, "pfalcato@...e.de" <pfalcato@...e.de>,
 "axelrasmussen@...gle.com" <axelrasmussen@...gle.com>,
 "yuanchu@...gle.com" <yuanchu@...gle.com>,
 "weixugc@...gle.com" <weixugc@...gle.com>,
 "hannes@...xchg.org" <hannes@...xchg.org>,
 "zhengqi.arch@...edance.com" <zhengqi.arch@...edance.com>,
 "shakeel.butt@...ux.dev" <shakeel.butt@...ux.dev>,
 "shuah@...nel.org" <shuah@...nel.org>, "seanjc@...gle.com"
 <seanjc@...gle.com>,
 "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
 "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
 "linux-arm-kernel@...ts.infradead.org"
 <linux-arm-kernel@...ts.infradead.org>,
 "kvmarm@...ts.linux.dev" <kvmarm@...ts.linux.dev>,
 "loongarch@...ts.linux.dev" <loongarch@...ts.linux.dev>,
 "linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>,
 "linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
 "linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
 "devel@...ts.orangefs.org" <devel@...ts.orangefs.org>,
 "linux-mm@...ck.org" <linux-mm@...ck.org>,
 "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
 "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>
Subject: Re: [PATCH v6 01/11] filemap: Pass address_space mapping to
 ->free_folio()

On 16.09.25 08:23, Hugh Dickins wrote:
> On Fri, 12 Sep 2025, Roy, Patrick wrote:
> 
>> From: Elliot Berman <quic_eberman@...cinc.com>
>>
>> When guest_memfd removes memory from the host kernel's direct map,
>> direct map entries must be restored before the memory is freed again. To
>> do so, ->free_folio() needs to know whether a gmem folio was direct map
>> removed in the first place though. While possible to keep track of this
>> information on each individual folio (e.g. via page flags), direct map
>> removal is an all-or-nothing property of the entire guest_memfd, so it
>> is less error prone to just check the flag stored in the gmem inode's
>> private data.  However, by the time ->free_folio() is called,
>> folio->mapping might be cleared. To still allow access to the address
>> space from which the folio was just removed, pass it in as an additional
>> argument to ->free_folio, as the mapping is well-known to all callers.
>>
>> Link: https://lore.kernel.org/all/15f665b4-2d33-41ca-ac50-fafe24ade32f@redhat.com/
>> Suggested-by: David Hildenbrand <david@...hat.com>
>> Acked-by: David Hildenbrand <david@...hat.com>
>> Signed-off-by: Elliot Berman <quic_eberman@...cinc.com>
>> [patrick: rewrite shortlog for new usecase]
>> Signed-off-by: Patrick Roy <roypat@...zon.co.uk>
>> ---
>>   Documentation/filesystems/locking.rst |  2 +-
>>   fs/nfs/dir.c                          | 11 ++++++-----
>>   fs/orangefs/inode.c                   |  3 ++-
>>   include/linux/fs.h                    |  2 +-
>>   mm/filemap.c                          |  9 +++++----
>>   mm/secretmem.c                        |  3 ++-
>>   mm/vmscan.c                           |  4 ++--
>>   virt/kvm/guest_memfd.c                |  3 ++-
>>   8 files changed, 21 insertions(+), 16 deletions(-)
>>
>> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
>> index aa287ccdac2f..74c97287ec40 100644
>> --- a/Documentation/filesystems/locking.rst
>> +++ b/Documentation/filesystems/locking.rst
>> @@ -262,7 +262,7 @@ prototypes::
>>   	sector_t (*bmap)(struct address_space *, sector_t);
>>   	void (*invalidate_folio) (struct folio *, size_t start, size_t len);
>>   	bool (*release_folio)(struct folio *, gfp_t);
>> -	void (*free_folio)(struct folio *);
>> +	void (*free_folio)(struct address_space *, struct folio *);
>>   	int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
>>   	int (*migrate_folio)(struct address_space *, struct folio *dst,
>>   			struct folio *src, enum migrate_mode);
> 
> Beware, that is against the intent of free_folio().
> 
> Since its 2.6.37 origin in 6072d13c4293 ("Call the filesystem back
> whenever a page is removed from the page cache"), freepage() or
> free_folio() has intentionally NOT taken a struct address_space *mapping,
> because that structure may already be freed by the time free_folio() is
> called, if the last folio holding it has now been freed.

Thanks for noticing that Hugh, very good point!

> 
> Maybe something has changed since then, or maybe it happens to be safe
> just in the context in which you want to use it; but it is against the
> principle of free_folio().  (Maybe an rcu_read_lock() could be added
> in __remove_mapping() to make it safe nowadays? maybe not welcome.)

Let me dig into the callers:


1) filemap_free_folio()

filemap_free_folio() looks up the callback through 
mapping->a_ops->free_folio. Nothing happens in-between that lookup and 
the callback so we should be good.


2) replace_page_cache_folio()

replace_page_cache_folio() similarly looks up the callback through
mapping->a_ops->free_folio. We do some operations afterwards, but 
essentially store the new folio in the page cache and remove the old one.

The only caller is fuse_try_move_folio(), and IIUC both folios are 
locked, preventing concurrent truncation and the mapping going away.


3) __remove_mapping()

__remove_mapping() also looks up the callback through 
mapping->a_ops->free_folio.

Before we call free_folio() we remove the folio from the pagecache 
(__filemap_remove_folio) to then drop locks and call free_folio().

We're only holding the folio lock at that point.

So yes I agree, truncate_inode_pages_final() could be racing with
__remove_mapping().c That's probably exactly what the docs describe 
regarding reclaim.


rcu_read_lock() should indeed work, or some other mechanism that keeps 
truncate_inode_pages_final() from succeeding in this racy situation.

Alternatively I guess we would have to use another callback.

-- 
Cheers

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ