[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKYAXd9+P6ekYnbXuoG95Nt5-H6bie6cSm4N-9RFDN3E+smJ+g@mail.gmail.com>
Date: Sun, 18 Jan 2026 14:00:09 +0900
From: Namjae Jeon <linkinjeon@...nel.org>
To: Christoph Hellwig <hch@....de>
Cc: viro@...iv.linux.org.uk, brauner@...nel.org, tytso@....edu,
willy@...radead.org, jack@...e.cz, djwong@...nel.org, josef@...icpanda.com,
sandeen@...deen.net, rgoldwyn@...e.com, xiang@...nel.org, dsterba@...e.com,
pali@...nel.org, ebiggers@...nel.org, neil@...wn.name, amir73il@...il.com,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
iamjoonsoo.kim@....com, cheol.lee@....com, jay.sim@....com, gunho.lee@....com,
Hyunchul Lee <hyc.lee@...il.com>
Subject: Re: [PATCH v5 07/14] ntfs: update iomap and address space operations
On Fri, Jan 16, 2026 at 6:17 PM Christoph Hellwig <hch@....de> wrote:
>
> First, a highlevel comment on the code structure:
>
> I'd expect the mft handling (ntfs_write_mft_block, ntfs_mft_writepage
> and ntfs_bmap which is only used by that and could use a comment
> update) to be in mft.c and not aops.c. I'm not sure if feasible,
> but separate aops for the MFT (that's the master file table IIRC) would
> probably be a good idea as well.
Okay.
>
> Similarly, ntfs_dev_read/write feel a bit out of place here.
Okay.
>
> > +void ntfs_bio_end_io(struct bio *bio)
> > {
> > + if (bio->bi_private)
> > + folio_put((struct folio *)bio->bi_private);
> > + bio_put(bio);
> > +}
>
> This function confuses me. In general end_io handlers should not
> need to drop a folio reference. For the normal buffered I/O path,
> the folio is locked for reads, and has the writeback bit set for
> writes, so this is no needed. When doing I/O in a private folio,
> the caller usually has a reference as it needs to do something with
> it. What is the reason for the special pattern here? A somewhat
> more descriptive name and a comment would help to describe why
> it's done this way.
The reason for this pattern is to prevent a race condition between
metadata I/O and inode eviction (e.g., during umount). ni->folio holds
mft record blocks (e.g., one 4KB folio containing four 1KB mft
records). When an MFT record is written to disk via submit_bio(), if a
concurrent umount occurs, the inode could be evicted, and
ntfs_evict_big_inode() would call folio_put(ni->folio). If this
happens before the I/O completes, the folio could be released
prematurely, potentially leading to data corruption or use-after-free.
To prevent this, I increment the folio reference count with
folio_get() before submit_bio() and decrement it in ntfs_bio_end_io().
I will add the comment for this.
> Also no need to cast a void pointer.
Okay, I will remove this also.
>
>
> > + if (bio &&
> > + (bio_end_sector(bio) >> (vol->cluster_size_bits - 9)) !=
> > + lcn) {
> > + flush_dcache_folio(folio);
> > + bio->bi_end_io = ntfs_bio_end_io;
> > + submit_bio(bio);
>
> If the MFT is what I think it is, the flush_dcache_folio here is not
> needed, as the folio can't ever be mapped into userspace.
Okay. I will remove it.
>
>
> > +static void ntfs_readahead(struct readahead_control *rac)
> > +{
> > + struct address_space *mapping = rac->mapping;
> > + struct inode *inode = mapping->host;
> > + struct ntfs_inode *ni = NTFS_I(inode);
> >
> > + if (!NInoNonResident(ni) || NInoCompressed(ni)) {
> > + /* No readahead for resident and compressed. */
> > + return;
> > + }
>
> Not supporting readahead for compressed inodes is a bit weird, as
> they should benefit most from operating on larger ranges. Not really
> a blocker, but probably something worth addressing over time.
I agree that compressed inodes would benefit significantly from
readahead due to the nature of compression units. However,
implementing readahead for the compressed path is currently complex
and requires more careful design. I will keep this as a high-priority
task for future work after upstream.
Thanks!
>
> > +static int ntfs_mft_writepage(struct folio *folio, struct writeback_control *wbc)
> > +{
>
> Instead of having a ntfs_mft_writepage, maybe implement a
> ntfs_mft_writepages that includes the writeback_iter() loop? And then
> move the folio_zero_segment into ntfs_write_mft_block so that one
> intermediate layer can go away?
>
> > +int ntfs_dev_read(struct super_block *sb, void *buf, loff_t start, loff_t size)
>
> Do you want the data for ntfs_dev_read/write cached in the bdev
> mapping? If not simply using bdev_rw_virt might be easier. If you
> want them cached, maybe add a comment why.
>
>
Powered by blists - more mailing lists