[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKYAXd8R=mPVR_ezDHRZqiKL9n-i5QRuDZnaK+poipBtCJtE=g@mail.gmail.com>
Date: Tue, 20 Jan 2026 14:11:24 +0900
From: Namjae Jeon <linkinjeon@...nel.org>
To: Christoph Hellwig <hch@....de>
Cc: viro@...iv.linux.org.uk, brauner@...nel.org, tytso@....edu,
willy@...radead.org, jack@...e.cz, djwong@...nel.org, josef@...icpanda.com,
sandeen@...deen.net, rgoldwyn@...e.com, xiang@...nel.org, dsterba@...e.com,
pali@...nel.org, ebiggers@...nel.org, neil@...wn.name, amir73il@...il.com,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
iamjoonsoo.kim@....com, cheol.lee@....com, jay.sim@....com, gunho.lee@....com,
Hyunchul Lee <hyc.lee@...il.com>
Subject: Re: [PATCH v5 06/14] ntfs: update file operations
On Mon, Jan 19, 2026 at 4:10 PM Christoph Hellwig <hch@....de> wrote:
>
> On Sun, Jan 18, 2026 at 01:56:55PM +0900, Namjae Jeon wrote:
> > > Talking about helpers, why does iomap_seek_hole/iomap_seek_data
> > > not work for ntfs?
> >
> > Regarding iomap_seek_hole/iomap_seek_data, the default iomap
> > implementation treats IOMAP_UNWRITTEN extents as holes unless they
> > have dirty pages in the page cache. However, in ntfs iomap begin, the
> > region between initialized_size and i_size (EOF) is mapped as
> > IOMAP_UNWRITTEN. Since NTFS requires any pre-allocated regions before
> > initialized_size to be physically zeroed, NTFS must treat all
> > pre-allocated regions as DATA.
>
> What do you need IOMAP_UNWRITTEN for in that case? If the blocks have
> been zeroed on-disk, they are IOMAP_MAPPED by the usual iomap standards.
> If you need special treatement, it might be worth adding a separate
> IOMAP_PREZEROED with clearly defined semantics instead of overloading
> IOMAP_UNWRITTEN.
By modifying iomap_begin, it seems possible to implement it using
iomap_seek_hole/data without introducing a new IOMAP_xxx type. My
previous explanation was insufficient, so let me provide a more
detailed clarification. The concept of an unwritten extent in NTFS is
slightly different from that of other filesystems. NTFS conceptually
manages only a single continuous unwritten region, which is strictly
defined based on initialized_size.
File offset
0 initialized_size i_size(EOF)
------------------------------------------------------------------------------------
| #0 | #1 | #2
| Actual data | pre-allocated | pre-allocated |
| (user written | (within initialized) | (initialized_size ~ EOF) |
| completed) | |
-------------------------------------------------------------------------------------
MAPPED MAPPED UNWRITTEN
* Region #1: must be zero-initialized by the filesystem.
* Region #2: does not need to be initialized.
Since NTFS does not support multiple unwritten extents, all
pre-allocated regions must, in principle, be treated as DATA, not
HOLE. However, in the current implementation, region #2 is mapped as
IOMAP_UNWRITTEN, so iomap_seek_data incorrectly interprets this region
as a hole. It would be better to map region #2 as IOMAP_MAPPED for the
seek operation.
>
> >
> > >
> > > > + file_accessed(iocb->ki_filp);
> > > > + ret = iomap_dio_rw(iocb, to, &ntfs_read_iomap_ops, NULL, IOMAP_DIO_PARTIAL,
> > >
> > > Why do you need IOMAP_DIO_PARTIAL? That's mostly a workaround
> > > for "interesting" locking in btrfs and gfs2. If ntfs has similar
> > > issues, it would be helpful to add a comment here. Also maybe fix
> > > the overly long line.
> > Regarding the use of IOMAP_DIO_PARTIAL, I was not aware that it was a
> > workaround for specific locking issues in some filesystems. I
> > incorrectly assumed it was a flag to enable partial success when a DIO
> > request exceeds the actual data length. I will remove this flags and
> > fix it.
>
> It only does short I/O for -EFAULT, which only happens if the nofault
> flag on the iov_iter is set. See the big comment in
> btrfs_direct_write where that field is set about the explanation.
Okay.
>
> > > What is the reason to do the expansion here instead of in the iomap_begin
> > > handler when we know we are committed to write to range?
> > We can probably move it to iomap_begin(). Let me check it.
>
> If it works better here that's also fine, just document it as it looks
> a bit unusual. Handling the cleanup on failures might be a bit easier
> if it is done in the iomap loop, though.
Okay. Thanks!
>
Powered by blists - more mailing lists