[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <AF891D9F-C006-411C-BC4C-3787622AB189@dilger.ca>
Date: Thu, 23 Oct 2025 09:48:58 -0600
From: Andreas Dilger <adilger@...ger.ca>
To: Dave Chinner <david@...morbit.com>
Cc: Kiryl Shutsemau <kirill@...temov.name>,
Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>,
Hugh Dickins <hughd@...gle.com>,
Matthew Wilcox <willy@...radead.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>,
Michal Hocko <mhocko@...e.com>,
Rik van Riel <riel@...riel.com>,
Harry Yoo <harry.yoo@...cle.com>,
Johannes Weiner <hannes@...xchg.org>,
Shakeel Butt <shakeel.butt@...ux.dev>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
"Darrick J. Wong" <djwong@...nel.org>,
linux-mm <linux-mm@...ck.org>,
linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC, PATCH 0/2] Large folios vs. SIGBUS semantics
> On Oct 23, 2025, at 5:38 AM, Dave Chinner <david@...morbit.com> wrote:
>
> On Tue, Oct 21, 2025 at 07:16:26AM +0100, Kiryl Shutsemau wrote:
>> On Tue, Oct 21, 2025 at 10:28:02AM +1100, Dave Chinner wrote:
>>> In critical paths like truncate, correctness and safety come first.
>>> Performance is only a secondary consideration. The overlap of
>>> mmap() and truncate() is an area where we have had many, many bugs
>>> and, at minimum, the current POSIX behaviour largely shields us from
>>> serious stale data exposure events when those bugs (inevitably)
>>> occur.
>>
>> How do you prevent writes via GUP racing with truncate()?
>>
>> Something like this:
>>
>> CPU0 CPU1
>> fd = open("file")
>> p = mmap(fd)
>> whatever_syscall(p)
>> get_user_pages(p, &page)
>> truncate("file");
>> <write to page>
>> put_page(page);
>
> Forget about truncate, go look at the comment above
> writable_file_mapping_allowed() about using GUP this way.
>
> i.e. file-backed mmap/GUP is a known broken anti-pattern. We've
> spent the past 15+ years telling people that it is unfixably broken
> and they will crash their kernel or corrupt there data if they do
> this.
>
> This is not supported functionality because real world production
> use ends up exposing problems with sync and background writeback
> races, truncate races, fallocate() races, writes into holes, writes
> into preallocated regions, writes over shared extents that require
> copy-on-write, etc, etc, ad nausiem.
>
> If anyone is using filebacked mappings like this, then when it
> breaks they get to keep all the broken pieces to themselves.
Should ftruncate("file") return ETXTBUSY in this case, so that users
and applications know this doesn't work/isn't safe? Unfortunately,
today's application developers barely even know how IO is done, so
there is little chance that they would understand subtleties like this.
Cheers, Andreas
>> The GUP can pin a page in the middle of a large folio well beyond the
>> truncation point. The folio will not be split on truncation due to the
>> elevated pin.
>>
>> I don't think this issue can be fundamentally fixed as long as we allow
>> GUP for file-backed memory.
>
> Yup, but that's the least of the problems with GUP on file-backed
> pages...
>
>> If the filesystem side cannot handle a non-zeroed tail of a large folio,
>> this SIGBUS semantics only hides the issue instead of addressing it.
>
> The objections raised have not related to whether a filesystem
> "cannot handle" this case or not. The concerns are about a change of
> behaviour in a well known, widely documented API, as well as the
> significant increase in surface area of potential data exposure it
> would enable should there be Yet Another Truncate Bug Again Once
> More.
>
> -Dave.
> --
> Dave Chinner
> david@...morbit.com
>
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)
Powered by blists - more mailing lists