[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aMmdCd_0acZHtqUN@google.com>
Date: Tue, 16 Sep 2025 17:23:21 +0000
From: Jaegeuk Kim <jaegeuk@...nel.org>
To: hanqi <hanqi@...o.com>
Cc: Chao Yu <chao@...nel.org>, axboe@...nel.dk,
linux-f2fs-devel@...ts.sourceforge.net,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] f2fs: f2fs support uncached buffer I/O read and write
On 09/16, hanqi wrote:
> Yes, for most storage devices, since disk performance is much lower
> than memory, writing with F2FS uncached buffer IO does not bring
> significant performance benefits. Its advantages might only become
> apparent in scenarios where disk performance exceeds that of memory.
>
> Therefore, I also agree that F2FS should first support uncached buffer
> IO for reads, as Chao mentioned.
Thanks, could you please post a patch and get some comments back?
>
> Thanks,
>
> On 2025/9/16 11:13, Chao Yu wrote:
> > On 9/12/25 07:53, Jaegeuk Kim wrote:
> > > Given the performance data and implementation overhead, I'm also questioning
> > > whether we really need to support this for writes or not. Can we get some common
> > > sense of usage models?
> > Seems uncached write implementation affects the performance a lot, I don't see
> > a good reason to merge this for now.
> >
> > I think we can try to enable uncached read functionality and return -EOPNOTSUPP
> > for uncached write first, meanwhile, let's see if there is anything good usecase
> > for uncached write.
> >
> > Thanks,
> >
> > > On 08/28, Qi Han wrote:
> > > > In the link [1], we adapted uncached buffer I/O read support in f2fs.
> > > > Now, let's move forward to enabling uncached buffer I/O write support
> > > > in f2fs.
> > > >
> > > > In f2fs_write_end_io, a separate asynchronous workqueue is created to
> > > > perform the page drop operation for bios that contain pages of type
> > > > FGP_DONTCACHE.
> > > >
> > > > The following patch is developed and tested based on the v6.17-rc3 branch.
> > > > My local testing results are as follows, along with some issues observed:
> > > > 1) Write performance degradation. Uncached buffer I/O write is slower than
> > > > normal buffered write because uncached I/O triggers a sync operation for
> > > > each I/O after data is written to memory, in order to drop pages promptly
> > > > at end_io. I assume this impact might be less visible on high-performance
> > > > storage devices such as PCIe 6.0 SSDs.
> > > > - f2fs_file_write_iter
> > > > - f2fs_buffered_write_iter
> > > > - generic_write_sync
> > > > - filemap_fdatawrite_range_kick
> > > > 2) As expected, page cache usage does not significantly increase during writes.
> > > > 3) The kswapd0 memory reclaim thread remains mostly idle, but additional
> > > > asynchronous work overhead is introduced, e.g:
> > > > PID USER PR NI VIRT RES SHR S[%CPU] %MEM TIME+ ARGS
> > > > 19650 root 0 -20 0 0 0 I 7.0 0.0 0:00.21 [kworker/u33:3-f2fs_post_write_wq]
> > > > 95 root 0 -20 0 0 0 I 6.6 0.0 0:02.08 [kworker/u33:0-f2fs_post_write_wq]
> > > > 19653 root 0 -20 0 0 0 I 4.6 0.0 0:01.25 [kworker/u33:6-f2fs_post_write_wq]
> > > > 19652 root 0 -20 0 0 0 I 4.3 0.0 0:00.92 [kworker/u33:5-f2fs_post_write_wq]
> > > > 19613 root 0 -20 0 0 0 I 4.3 0.0 0:00.99 [kworker/u33:1-f2fs_post_write_wq]
> > > > 19651 root 0 -20 0 0 0 I 3.6 0.0 0:00.98 [kworker/u33:4-f2fs_post_write_wq]
> > > > 19654 root 0 -20 0 0 0 I 3.0 0.0 0:01.05 [kworker/u33:7-f2fs_post_write_wq]
> > > > 19655 root 0 -20 0 0 0 I 2.3 0.0 0:01.18 [kworker/u33:8-f2fs_post_write_wq]
> > > >
> > > > >From these results on my test device, introducing uncached buffer I/O write on
> > > > f2fs seems to bring more drawbacks than benefits. Do we really need to support
> > > > uncached buffer I/O write in f2fs?
> > > >
> > > > Write test data without using uncached buffer I/O:
> > > > Starting 1 threads
> > > > pid: 17609
> > > > writing bs 8192, uncached 0
> > > > 1s: 753MB/sec, MB=753
> > > > 2s: 792MB/sec, MB=1546
> > > > 3s: 430MB/sec, MB=1978
> > > > 4s: 661MB/sec, MB=2636
> > > > 5s: 900MB/sec, MB=3542
> > > > 6s: 769MB/sec, MB=4308
> > > > 7s: 808MB/sec, MB=5113
> > > > 8s: 766MB/sec, MB=5884
> > > > 9s: 654MB/sec, MB=6539
> > > > 10s: 456MB/sec, MB=6995
> > > > 11s: 797MB/sec, MB=7793
> > > > 12s: 770MB/sec, MB=8563
> > > > 13s: 778MB/sec, MB=9341
> > > > 14s: 726MB/sec, MB=10077
> > > > 15s: 736MB/sec, MB=10803
> > > > 16s: 798MB/sec, MB=11602
> > > > 17s: 728MB/sec, MB=12330
> > > > 18s: 749MB/sec, MB=13080
> > > > 19s: 777MB/sec, MB=13857
> > > > 20s: 688MB/sec, MB=14395
> > > >
> > > > 19:29:34 UID PID %usr %system %guest %wait %CPU CPU Command
> > > > 19:29:35 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:29:36 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:29:37 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:29:38 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:29:39 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:29:40 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:29:41 0 94 0.00 2.00 0.00 0.00 2.00 0 kswapd0
> > > > 19:29:42 0 94 0.00 59.00 0.00 0.00 59.00 7 kswapd0
> > > > 19:29:43 0 94 0.00 45.00 0.00 0.00 45.00 7 kswapd0
> > > > 19:29:44 0 94 0.00 36.00 0.00 0.00 36.00 0 kswapd0
> > > > 19:29:45 0 94 0.00 27.00 0.00 1.00 27.00 0 kswapd0
> > > > 19:29:46 0 94 0.00 26.00 0.00 0.00 26.00 2 kswapd0
> > > > 19:29:47 0 94 0.00 57.00 0.00 0.00 57.00 7 kswapd0
> > > > 19:29:48 0 94 0.00 41.00 0.00 0.00 41.00 7 kswapd0
> > > > 19:29:49 0 94 0.00 38.00 0.00 0.00 38.00 7 kswapd0
> > > > 19:29:50 0 94 0.00 47.00 0.00 0.00 47.00 7 kswapd0
> > > > 19:29:51 0 94 0.00 43.00 0.00 1.00 43.00 7 kswapd0
> > > > 19:29:52 0 94 0.00 36.00 0.00 0.00 36.00 7 kswapd0
> > > > 19:29:53 0 94 0.00 39.00 0.00 0.00 39.00 2 kswapd0
> > > > 19:29:54 0 94 0.00 46.00 0.00 0.00 46.00 7 kswapd0
> > > > 19:29:55 0 94 0.00 43.00 0.00 0.00 43.00 7 kswapd0
> > > > 19:29:56 0 94 0.00 39.00 0.00 0.00 39.00 7 kswapd0
> > > > 19:29:57 0 94 0.00 29.00 0.00 1.00 29.00 1 kswapd0
> > > > 19:29:58 0 94 0.00 17.00 0.00 0.00 17.00 4 kswapd0
> > > >
> > > > 19:29:33 kbmemfree kbavail kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
> > > > 19:29:34 4464588 6742648 4420876 38.12 6156 2032600 179730872 743.27 1863412 1822544 4
> > > > 19:29:35 4462572 6740784 4422752 38.13 6156 2032752 179739004 743.30 1863460 1823584 16
> > > > 19:29:36 4381512 6740856 4422420 38.13 6156 2114144 179746508 743.33 1863476 1905384 81404
> > > > 19:29:37 3619456 6741840 4421588 38.12 6156 2877032 179746652 743.33 1863536 2668896 592584
> > > > 19:29:38 2848184 6740720 4422472 38.13 6164 3646188 179746652 743.33 1863600 3438520 815692
> > > > 19:29:39 2436336 6739452 4423720 38.14 6164 4056772 179746652 743.33 1863604 3849164 357096
> > > > 19:29:40 1712660 6737700 4425140 38.15 6164 4779020 179746604 743.33 1863612 4571124 343716
> > > > 19:29:41 810664 6738020 4425004 38.15 6164 5681152 179746604 743.33 1863612 5473444 297268
> > > > 19:29:42 673756 6779120 4373200 37.71 5656 5869928 179746604 743.33 1902852 5589452 269032
> > > > 19:29:43 688480 6782024 4371012 37.69 5648 5856940 179750048 743.34 1926336 5542004 279344
> > > > 19:29:44 688956 6789028 4364260 37.63 5584 5863272 179750048 743.34 1941608 5518808 300096
> > > > 19:29:45 740768 6804560 4348772 37.49 5524 5827248 179750000 743.34 1954084 5452844 123120
> > > > 19:29:46 697936 6810612 4342768 37.44 5524 5876048 179750048 743.34 1962020 5483944 274908
> > > > 19:29:47 734504 6818716 4334156 37.37 5512 5849188 179750000 743.34 1978120 5426796 274504
> > > > 19:29:48 771696 6828316 4324180 37.28 5504 5820948 179762260 743.39 2006732 5354152 305388
> > > > 19:29:49 691944 6838812 4313108 37.19 5476 5912444 179749952 743.34 2021720 5418996 296852
> > > > 19:29:50 679392 6844496 4306892 37.13 5452 5931356 179749952 743.34 1982772 5463040 233600
> > > > 19:29:51 768528 6868080 4284224 36.94 5412 5865704 176317452 729.15 1990220 5359012 343160
> > > > 19:29:52 717880 6893940 4259968 36.73 5400 5942368 176317404 729.15 1965624 5444140 304856
> > > > 19:29:53 712408 6902660 4251268 36.65 5372 5956584 176318376 729.15 1969192 5442132 290224
> > > > 19:29:54 707184 6917512 4236160 36.52 5344 5976944 176318568 729.15 1968716 5443620 336948
> > > > 19:29:55 703172 6921608 4232332 36.49 5292 5984836 176318568 729.15 1979788 5429484 328716
> > > > 19:29:56 733256 6933020 4220864 36.39 5212 5966340 176318568 729.15 1983636 5396256 300008
> > > > 19:29:57 723308 6936340 4217280 36.36 5120 5979816 176318568 729.15 1987088 5394360 508792
> > > > 19:29:58 732148 6942972 4210680 36.30 5108 5977656 176311064 729.12 1990400 5379884 214936
> > > >
> > > > Write test data after using uncached buffer I/O:
> > > > Starting 1 threads
> > > > pid: 17742
> > > > writing bs 8192, uncached 1
> > > > 1s: 433MB/sec, MB=433
> > > > 2s: 195MB/sec, MB=628
> > > > 3s: 209MB/sec, MB=836
> > > > 4s: 54MB/sec, MB=883
> > > > 5s: 277MB/sec, MB=1169
> > > > 6s: 141MB/sec, MB=1311
> > > > 7s: 185MB/sec, MB=1495
> > > > 8s: 134MB/sec, MB=1631
> > > > 9s: 201MB/sec, MB=1834
> > > > 10s: 283MB/sec, MB=2114
> > > > 11s: 223MB/sec, MB=2339
> > > > 12s: 164MB/sec, MB=2506
> > > > 13s: 155MB/sec, MB=2657
> > > > 14s: 132MB/sec, MB=2792
> > > > 15s: 186MB/sec, MB=2965
> > > > 16s: 218MB/sec, MB=3198
> > > > 17s: 220MB/sec, MB=3412
> > > > 18s: 191MB/sec, MB=3606
> > > > 19s: 214MB/sec, MB=3828
> > > > 20s: 257MB/sec, MB=4085
> > > >
> > > > 19:31:31 UID PID %usr %system %guest %wait %CPU CPU Command
> > > > 19:31:32 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:33 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:34 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:35 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:36 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:37 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:38 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:39 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:40 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:41 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:42 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:43 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:44 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:45 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:46 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:47 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:48 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > > 19:31:49 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> > > >
> > > > 19:31:31 kbmemfree kbavail kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
> > > > 19:31:32 4816812 6928788 4225812 36.43 5148 1879676 176322636 729.17 1920900 1336548 285748
> > > > 19:31:33 4781880 6889428 4265592 36.78 5148 1874860 176322636 729.17 1920920 1332268 279028
> > > > 19:31:34 4758972 6822588 4332376 37.35 5148 1830984 176322636 729.17 1920920 1288976 233040
> > > > 19:31:35 4850248 6766480 4387840 37.83 5148 1684244 176322636 729.17 1920920 1142408 90508
> > > > 19:31:36 4644176 6741676 4413256 38.05 5148 1864900 176322636 729.17 1920920 1323452 269380
> > > > 19:31:37 4637900 6681480 4473436 38.57 5148 1810996 176322588 729.17 1920920 1269612 217632
> > > > 19:31:38 4502108 6595508 4559500 39.31 5148 1860724 176322492 729.17 1920920 1319588 267760
> > > > 19:31:39 4498844 6551068 4603928 39.69 5148 1819528 176322492 729.17 1920920 1278440 226496
> > > > 19:31:40 4498812 6587396 4567340 39.38 5148 1856116 176322492 729.17 1920920 1314800 263292
> > > > 19:31:41 4656784 6706252 4448372 38.35 5148 1817112 176322492 729.17 1920920 1275704 224600
> > > > 19:31:42 4635032 6673328 4481436 38.64 5148 1805816 176322492 729.17 1920920 1264548 213436
> > > > 19:31:43 4636852 6679736 4474884 38.58 5148 1810548 176322492 729.17 1920932 1269796 218276
> > > > 19:31:44 4654740 6669104 4485544 38.67 5148 1782000 176322444 729.17 1920932 1241552 189880
> > > > 19:31:45 4821604 6693156 4461848 38.47 5148 1638864 176322444 729.17 1920932 1098784 31076
> > > > 19:31:46 4707548 6728796 4426400 38.16 5148 1788368 176322444 729.17 1920932 1248936 196596
> > > > 19:31:47 4683996 6747632 4407348 38.00 5148 1830968 176322444 729.17 1920932 1291396 239636
> > > > 19:31:48 4694648 6773808 4381320 37.78 5148 1846376 176322624 729.17 1920944 1307576 254800
> > > > 19:31:49 4663784 6730212 4424776 38.15 5148 1833784 176322772 729.17 1920948 1295156 242200
> > > >
> > > > [1]
> > > > https://lore.kernel.org/lkml/20250725075310.1614614-1-hanqi@vivo.com/
> > > >
> > > > Signed-off-by: Qi Han <hanqi@...o.com>
> > > > ---
> > > > fs/f2fs/data.c | 178 ++++++++++++++++++++++++++++++++++------------
> > > > fs/f2fs/f2fs.h | 5 ++
> > > > fs/f2fs/file.c | 2 +-
> > > > fs/f2fs/iostat.c | 8 ++-
> > > > fs/f2fs/iostat.h | 4 +-
> > > > fs/f2fs/segment.c | 2 +-
> > > > fs/f2fs/super.c | 16 ++++-
> > > > 7 files changed, 161 insertions(+), 54 deletions(-)
> > > >
> > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > > index 7961e0ddfca3..4eeb2b36473d 100644
> > > > --- a/fs/f2fs/data.c
> > > > +++ b/fs/f2fs/data.c
> > > > @@ -30,8 +30,10 @@
> > > > #define NUM_PREALLOC_POST_READ_CTXS 128
> > > > static struct kmem_cache *bio_post_read_ctx_cache;
> > > > +static struct kmem_cache *bio_post_write_ctx_cache;
> > > > static struct kmem_cache *bio_entry_slab;
> > > > static mempool_t *bio_post_read_ctx_pool;
> > > > +static mempool_t *bio_post_write_ctx_pool;
> > > > static struct bio_set f2fs_bioset;
> > > > #define F2FS_BIO_POOL_SIZE NR_CURSEG_TYPE
> > > > @@ -120,6 +122,12 @@ struct bio_post_read_ctx {
> > > > block_t fs_blkaddr;
> > > > };
> > > > +struct bio_post_write_ctx {
> > > > + struct bio *bio;
> > > > + struct f2fs_sb_info *sbi;
> > > > + struct work_struct work;
> > > > +};
> > > > +
> > > > /*
> > > > * Update and unlock a bio's pages, and free the bio.
> > > > *
> > > > @@ -159,6 +167,56 @@ static void f2fs_finish_read_bio(struct bio *bio, bool in_task)
> > > > bio_put(bio);
> > > > }
> > > > +static void f2fs_finish_write_bio(struct f2fs_sb_info *sbi, struct bio *bio)
> > > > +{
> > > > + struct folio_iter fi;
> > > > + struct bio_post_write_ctx *write_ctx = (struct bio_post_write_ctx *)bio->bi_private;
> > > > +
> > > > + bio_for_each_folio_all(fi, bio) {
> > > > + struct folio *folio = fi.folio;
> > > > + enum count_type type;
> > > > +
> > > > + if (fscrypt_is_bounce_folio(folio)) {
> > > > + struct folio *io_folio = folio;
> > > > +
> > > > + folio = fscrypt_pagecache_folio(io_folio);
> > > > + fscrypt_free_bounce_page(&io_folio->page);
> > > > + }
> > > > +
> > > > +#ifdef CONFIG_F2FS_FS_COMPRESSION
> > > > + if (f2fs_is_compressed_page(folio)) {
> > > > + f2fs_compress_write_end_io(bio, folio);
> > > > + continue;
> > > > + }
> > > > +#endif
> > > > +
> > > > + type = WB_DATA_TYPE(folio, false);
> > > > +
> > > > + if (unlikely(bio->bi_status != BLK_STS_OK)) {
> > > > + mapping_set_error(folio->mapping, -EIO);
> > > > + if (type == F2FS_WB_CP_DATA)
> > > > + f2fs_stop_checkpoint(sbi, true,
> > > > + STOP_CP_REASON_WRITE_FAIL);
> > > > + }
> > > > +
> > > > + f2fs_bug_on(sbi, is_node_folio(folio) &&
> > > > + folio->index != nid_of_node(folio));
> > > > +
> > > > + dec_page_count(sbi, type);
> > > > + if (f2fs_in_warm_node_list(sbi, folio))
> > > > + f2fs_del_fsync_node_entry(sbi, folio);
> > > > + folio_clear_f2fs_gcing(folio);
> > > > + folio_end_writeback(folio);
> > > > + }
> > > > + if (!get_pages(sbi, F2FS_WB_CP_DATA) &&
> > > > + wq_has_sleeper(&sbi->cp_wait))
> > > > + wake_up(&sbi->cp_wait);
> > > > +
> > > > + if (write_ctx)
> > > > + mempool_free(write_ctx, bio_post_write_ctx_pool);
> > > > + bio_put(bio);
> > > > +}
> > > > +
> > > > static void f2fs_verify_bio(struct work_struct *work)
> > > > {
> > > > struct bio_post_read_ctx *ctx =
> > > > @@ -314,58 +372,32 @@ static void f2fs_read_end_io(struct bio *bio)
> > > > f2fs_verify_and_finish_bio(bio, intask);
> > > > }
> > > > +static void f2fs_finish_write_bio_async_work(struct work_struct *work)
> > > > +{
> > > > + struct bio_post_write_ctx *ctx =
> > > > + container_of(work, struct bio_post_write_ctx, work);
> > > > +
> > > > + f2fs_finish_write_bio(ctx->sbi, ctx->bio);
> > > > +}
> > > > +
> > > > static void f2fs_write_end_io(struct bio *bio)
> > > > {
> > > > - struct f2fs_sb_info *sbi;
> > > > - struct folio_iter fi;
> > > > + struct f2fs_sb_info *sbi = F2FS_F_SB(bio_first_folio_all(bio));
> > > > + struct bio_post_write_ctx *write_ctx;
> > > > iostat_update_and_unbind_ctx(bio);
> > > > - sbi = bio->bi_private;
> > > > if (time_to_inject(sbi, FAULT_WRITE_IO))
> > > > bio->bi_status = BLK_STS_IOERR;
> > > > - bio_for_each_folio_all(fi, bio) {
> > > > - struct folio *folio = fi.folio;
> > > > - enum count_type type;
> > > > -
> > > > - if (fscrypt_is_bounce_folio(folio)) {
> > > > - struct folio *io_folio = folio;
> > > > -
> > > > - folio = fscrypt_pagecache_folio(io_folio);
> > > > - fscrypt_free_bounce_page(&io_folio->page);
> > > > - }
> > > > -
> > > > -#ifdef CONFIG_F2FS_FS_COMPRESSION
> > > > - if (f2fs_is_compressed_page(folio)) {
> > > > - f2fs_compress_write_end_io(bio, folio);
> > > > - continue;
> > > > - }
> > > > -#endif
> > > > -
> > > > - type = WB_DATA_TYPE(folio, false);
> > > > -
> > > > - if (unlikely(bio->bi_status != BLK_STS_OK)) {
> > > > - mapping_set_error(folio->mapping, -EIO);
> > > > - if (type == F2FS_WB_CP_DATA)
> > > > - f2fs_stop_checkpoint(sbi, true,
> > > > - STOP_CP_REASON_WRITE_FAIL);
> > > > - }
> > > > -
> > > > - f2fs_bug_on(sbi, is_node_folio(folio) &&
> > > > - folio->index != nid_of_node(folio));
> > > > -
> > > > - dec_page_count(sbi, type);
> > > > - if (f2fs_in_warm_node_list(sbi, folio))
> > > > - f2fs_del_fsync_node_entry(sbi, folio);
> > > > - folio_clear_f2fs_gcing(folio);
> > > > - folio_end_writeback(folio);
> > > > + write_ctx = (struct bio_post_write_ctx *)bio->bi_private;
> > > > + if (write_ctx) {
> > > > + INIT_WORK(&write_ctx->work, f2fs_finish_write_bio_async_work);
> > > > + queue_work(write_ctx->sbi->post_write_wq, &write_ctx->work);
> > > > + return;
> > > > }
> > > > - if (!get_pages(sbi, F2FS_WB_CP_DATA) &&
> > > > - wq_has_sleeper(&sbi->cp_wait))
> > > > - wake_up(&sbi->cp_wait);
> > > > - bio_put(bio);
> > > > + f2fs_finish_write_bio(sbi, bio);
> > > > }
> > > > #ifdef CONFIG_BLK_DEV_ZONED
> > > > @@ -467,11 +499,10 @@ static struct bio *__bio_alloc(struct f2fs_io_info *fio, int npages)
> > > > bio->bi_private = NULL;
> > > > } else {
> > > > bio->bi_end_io = f2fs_write_end_io;
> > > > - bio->bi_private = sbi;
> > > > + bio->bi_private = NULL;
> > > > bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi,
> > > > fio->type, fio->temp);
> > > > }
> > > > - iostat_alloc_and_bind_ctx(sbi, bio, NULL);
> > > > if (fio->io_wbc)
> > > > wbc_init_bio(fio->io_wbc, bio);
> > > > @@ -701,6 +732,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
> > > > /* Allocate a new bio */
> > > > bio = __bio_alloc(fio, 1);
> > > > + iostat_alloc_and_bind_ctx(fio->sbi, bio, NULL, NULL);
> > > > f2fs_set_bio_crypt_ctx(bio, fio_folio->mapping->host,
> > > > fio_folio->index, fio, GFP_NOIO);
> > > > @@ -899,6 +931,8 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio)
> > > > alloc_new:
> > > > if (!bio) {
> > > > bio = __bio_alloc(fio, BIO_MAX_VECS);
> > > > + iostat_alloc_and_bind_ctx(fio->sbi, bio, NULL, NULL);
> > > > +
> > > > f2fs_set_bio_crypt_ctx(bio, folio->mapping->host,
> > > > folio->index, fio, GFP_NOIO);
> > > > @@ -948,6 +982,7 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
> > > > struct f2fs_bio_info *io = sbi->write_io[btype] + fio->temp;
> > > > struct folio *bio_folio;
> > > > enum count_type type;
> > > > + struct bio_post_write_ctx *write_ctx = NULL;
> > > > f2fs_bug_on(sbi, is_read_io(fio->op));
> > > > @@ -1001,6 +1036,13 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
> > > > f2fs_set_bio_crypt_ctx(io->bio, fio_inode(fio),
> > > > bio_folio->index, fio, GFP_NOIO);
> > > > io->fio = *fio;
> > > > +
> > > > + if (folio_test_dropbehind(bio_folio)) {
> > > > + write_ctx = mempool_alloc(bio_post_write_ctx_pool, GFP_NOFS);
> > > > + write_ctx->bio = io->bio;
> > > > + write_ctx->sbi = sbi;
> > > > + }
> > > > + iostat_alloc_and_bind_ctx(fio->sbi, io->bio, NULL, write_ctx);
> > > > }
> > > > if (!bio_add_folio(io->bio, bio_folio, folio_size(bio_folio), 0)) {
> > > > @@ -1077,7 +1119,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
> > > > ctx->decompression_attempted = false;
> > > > bio->bi_private = ctx;
> > > > }
> > > > - iostat_alloc_and_bind_ctx(sbi, bio, ctx);
> > > > + iostat_alloc_and_bind_ctx(sbi, bio, ctx, NULL);
> > > > return bio;
> > > > }
> > > > @@ -3540,6 +3582,7 @@ static int f2fs_write_begin(const struct kiocb *iocb,
> > > > bool use_cow = false;
> > > > block_t blkaddr = NULL_ADDR;
> > > > int err = 0;
> > > > + fgf_t fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT;
> > > > trace_f2fs_write_begin(inode, pos, len);
> > > > @@ -3582,12 +3625,13 @@ static int f2fs_write_begin(const struct kiocb *iocb,
> > > > #endif
> > > > repeat:
> > > > + if (iocb && iocb->ki_flags & IOCB_DONTCACHE)
> > > > + fgp |= FGP_DONTCACHE;
> > > > /*
> > > > * Do not use FGP_STABLE to avoid deadlock.
> > > > * Will wait that below with our IO control.
> > > > */
> > > > - folio = __filemap_get_folio(mapping, index,
> > > > - FGP_LOCK | FGP_WRITE | FGP_CREAT, GFP_NOFS);
> > > > + folio = __filemap_get_folio(mapping, index, fgp, GFP_NOFS);
> > > > if (IS_ERR(folio)) {
> > > > err = PTR_ERR(folio);
> > > > goto fail;
> > > > @@ -4127,12 +4171,38 @@ int __init f2fs_init_post_read_processing(void)
> > > > return -ENOMEM;
> > > > }
> > > > +int __init f2fs_init_post_write_processing(void)
> > > > +{
> > > > + bio_post_write_ctx_cache =
> > > > + kmem_cache_create("f2fs_bio_post_write_ctx",
> > > > + sizeof(struct bio_post_write_ctx), 0, 0, NULL);
> > > > + if (!bio_post_write_ctx_cache)
> > > > + goto fail;
> > > > + bio_post_write_ctx_pool =
> > > > + mempool_create_slab_pool(NUM_PREALLOC_POST_READ_CTXS,
> > > > + bio_post_write_ctx_cache);
> > > > + if (!bio_post_write_ctx_pool)
> > > > + goto fail_free_cache;
> > > > + return 0;
> > > > +
> > > > +fail_free_cache:
> > > > + kmem_cache_destroy(bio_post_write_ctx_cache);
> > > > +fail:
> > > > + return -ENOMEM;
> > > > +}
> > > > +
> > > > void f2fs_destroy_post_read_processing(void)
> > > > {
> > > > mempool_destroy(bio_post_read_ctx_pool);
> > > > kmem_cache_destroy(bio_post_read_ctx_cache);
> > > > }
> > > > +void f2fs_destroy_post_write_processing(void)
> > > > +{
> > > > + mempool_destroy(bio_post_write_ctx_pool);
> > > > + kmem_cache_destroy(bio_post_write_ctx_cache);
> > > > +}
> > > > +
> > > > int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi)
> > > > {
> > > > if (!f2fs_sb_has_encrypt(sbi) &&
> > > > @@ -4146,12 +4216,26 @@ int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi)
> > > > return sbi->post_read_wq ? 0 : -ENOMEM;
> > > > }
> > > > +int f2fs_init_post_write_wq(struct f2fs_sb_info *sbi)
> > > > +{
> > > > + sbi->post_write_wq = alloc_workqueue("f2fs_post_write_wq",
> > > > + WQ_UNBOUND | WQ_HIGHPRI,
> > > > + num_online_cpus());
> > > > + return sbi->post_write_wq ? 0 : -ENOMEM;
> > > > +}
> > > > +
> > > > void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi)
> > > > {
> > > > if (sbi->post_read_wq)
> > > > destroy_workqueue(sbi->post_read_wq);
> > > > }
> > > > +void f2fs_destroy_post_write_wq(struct f2fs_sb_info *sbi)
> > > > +{
> > > > + if (sbi->post_write_wq)
> > > > + destroy_workqueue(sbi->post_write_wq);
> > > > +}
> > > > +
> > > > int __init f2fs_init_bio_entry_cache(void)
> > > > {
> > > > bio_entry_slab = f2fs_kmem_cache_create("f2fs_bio_entry_slab",
> > > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > > > index 46be7560548c..fe3f81876b23 100644
> > > > --- a/fs/f2fs/f2fs.h
> > > > +++ b/fs/f2fs/f2fs.h
> > > > @@ -1812,6 +1812,7 @@ struct f2fs_sb_info {
> > > > /* Precomputed FS UUID checksum for seeding other checksums */
> > > > __u32 s_chksum_seed;
> > > > + struct workqueue_struct *post_write_wq;
> > > > struct workqueue_struct *post_read_wq; /* post read workqueue */
> > > > /*
> > > > @@ -4023,9 +4024,13 @@ bool f2fs_release_folio(struct folio *folio, gfp_t wait);
> > > > bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
> > > > void f2fs_clear_page_cache_dirty_tag(struct folio *folio);
> > > > int f2fs_init_post_read_processing(void);
> > > > +int f2fs_init_post_write_processing(void);
> > > > void f2fs_destroy_post_read_processing(void);
> > > > +void f2fs_destroy_post_write_processing(void);
> > > > int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi);
> > > > +int f2fs_init_post_write_wq(struct f2fs_sb_info *sbi);
> > > > void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi);
> > > > +void f2fs_destroy_post_write_wq(struct f2fs_sb_info *sbi);
> > > > extern const struct iomap_ops f2fs_iomap_ops;
> > > > /*
> > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > index 42faaed6a02d..8aa6a4fd52e8 100644
> > > > --- a/fs/f2fs/file.c
> > > > +++ b/fs/f2fs/file.c
> > > > @@ -5443,5 +5443,5 @@ const struct file_operations f2fs_file_operations = {
> > > > .splice_read = f2fs_file_splice_read,
> > > > .splice_write = iter_file_splice_write,
> > > > .fadvise = f2fs_file_fadvise,
> > > > - .fop_flags = FOP_BUFFER_RASYNC,
> > > > + .fop_flags = FOP_BUFFER_RASYNC | FOP_DONTCACHE,
> > > > };
> > > > diff --git a/fs/f2fs/iostat.c b/fs/f2fs/iostat.c
> > > > index f8703038e1d8..b2e6ce80c68d 100644
> > > > --- a/fs/f2fs/iostat.c
> > > > +++ b/fs/f2fs/iostat.c
> > > > @@ -245,7 +245,7 @@ void iostat_update_and_unbind_ctx(struct bio *bio)
> > > > if (op_is_write(bio_op(bio))) {
> > > > lat_type = bio->bi_opf & REQ_SYNC ?
> > > > WRITE_SYNC_IO : WRITE_ASYNC_IO;
> > > > - bio->bi_private = iostat_ctx->sbi;
> > > > + bio->bi_private = iostat_ctx->post_write_ctx;
> > > > } else {
> > > > lat_type = READ_IO;
> > > > bio->bi_private = iostat_ctx->post_read_ctx;
> > > > @@ -256,7 +256,8 @@ void iostat_update_and_unbind_ctx(struct bio *bio)
> > > > }
> > > > void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
> > > > - struct bio *bio, struct bio_post_read_ctx *ctx)
> > > > + struct bio *bio, struct bio_post_read_ctx *read_ctx,
> > > > + struct bio_post_write_ctx *write_ctx)
> > > > {
> > > > struct bio_iostat_ctx *iostat_ctx;
> > > > /* Due to the mempool, this never fails. */
> > > > @@ -264,7 +265,8 @@ void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
> > > > iostat_ctx->sbi = sbi;
> > > > iostat_ctx->submit_ts = 0;
> > > > iostat_ctx->type = 0;
> > > > - iostat_ctx->post_read_ctx = ctx;
> > > > + iostat_ctx->post_read_ctx = read_ctx;
> > > > + iostat_ctx->post_write_ctx = write_ctx;
> > > > bio->bi_private = iostat_ctx;
> > > > }
> > > > diff --git a/fs/f2fs/iostat.h b/fs/f2fs/iostat.h
> > > > index eb99d05cf272..a358909bb5e8 100644
> > > > --- a/fs/f2fs/iostat.h
> > > > +++ b/fs/f2fs/iostat.h
> > > > @@ -40,6 +40,7 @@ struct bio_iostat_ctx {
> > > > unsigned long submit_ts;
> > > > enum page_type type;
> > > > struct bio_post_read_ctx *post_read_ctx;
> > > > + struct bio_post_write_ctx *post_write_ctx;
> > > > };
> > > > static inline void iostat_update_submit_ctx(struct bio *bio,
> > > > @@ -60,7 +61,8 @@ static inline struct bio_post_read_ctx *get_post_read_ctx(struct bio *bio)
> > > > extern void iostat_update_and_unbind_ctx(struct bio *bio);
> > > > extern void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
> > > > - struct bio *bio, struct bio_post_read_ctx *ctx);
> > > > + struct bio *bio, struct bio_post_read_ctx *read_ctx,
> > > > + struct bio_post_write_ctx *write_ctx);
> > > > extern int f2fs_init_iostat_processing(void);
> > > > extern void f2fs_destroy_iostat_processing(void);
> > > > extern int f2fs_init_iostat(struct f2fs_sb_info *sbi);
> > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > > index cc82d42ef14c..8501008e42b2 100644
> > > > --- a/fs/f2fs/segment.c
> > > > +++ b/fs/f2fs/segment.c
> > > > @@ -3856,7 +3856,7 @@ int f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct folio *folio,
> > > > f2fs_inode_chksum_set(sbi, folio);
> > > > }
> > > > - if (fio) {
> > > > + if (fio && !folio_test_dropbehind(folio)) {
> > > > struct f2fs_bio_info *io;
> > > > INIT_LIST_HEAD(&fio->list);
> > > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > > > index e16c4e2830c2..110dfe073aee 100644
> > > > --- a/fs/f2fs/super.c
> > > > +++ b/fs/f2fs/super.c
> > > > @@ -1963,6 +1963,7 @@ static void f2fs_put_super(struct super_block *sb)
> > > > flush_work(&sbi->s_error_work);
> > > > f2fs_destroy_post_read_wq(sbi);
> > > > + f2fs_destroy_post_write_wq(sbi);
> > > > kvfree(sbi->ckpt);
> > > > @@ -4959,6 +4960,12 @@ static int f2fs_fill_super(struct super_block *sb, struct fs_context *fc)
> > > > goto free_devices;
> > > > }
> > > > + err = f2fs_init_post_write_wq(sbi);
> > > > + if (err) {
> > > > + f2fs_err(sbi, "Failed to initialize post write workqueue");
> > > > + goto free_devices;
> > > > + }
> > > > +
> > > > sbi->total_valid_node_count =
> > > > le32_to_cpu(sbi->ckpt->valid_node_count);
> > > > percpu_counter_set(&sbi->total_valid_inode_count,
> > > > @@ -5240,6 +5247,7 @@ static int f2fs_fill_super(struct super_block *sb, struct fs_context *fc)
> > > > /* flush s_error_work before sbi destroy */
> > > > flush_work(&sbi->s_error_work);
> > > > f2fs_destroy_post_read_wq(sbi);
> > > > + f2fs_destroy_post_write_wq(sbi);
> > > > free_devices:
> > > > destroy_device_list(sbi);
> > > > kvfree(sbi->ckpt);
> > > > @@ -5435,9 +5443,12 @@ static int __init init_f2fs_fs(void)
> > > > err = f2fs_init_post_read_processing();
> > > > if (err)
> > > > goto free_root_stats;
> > > > - err = f2fs_init_iostat_processing();
> > > > + err = f2fs_init_post_write_processing();
> > > > if (err)
> > > > goto free_post_read;
> > > > + err = f2fs_init_iostat_processing();
> > > > + if (err)
> > > > + goto free_post_write;
> > > > err = f2fs_init_bio_entry_cache();
> > > > if (err)
> > > > goto free_iostat;
> > > > @@ -5469,6 +5480,8 @@ static int __init init_f2fs_fs(void)
> > > > f2fs_destroy_bio_entry_cache();
> > > > free_iostat:
> > > > f2fs_destroy_iostat_processing();
> > > > +free_post_write:
> > > > + f2fs_destroy_post_write_processing();
> > > > free_post_read:
> > > > f2fs_destroy_post_read_processing();
> > > > free_root_stats:
> > > > @@ -5504,6 +5517,7 @@ static void __exit exit_f2fs_fs(void)
> > > > f2fs_destroy_bio_entry_cache();
> > > > f2fs_destroy_iostat_processing();
> > > > f2fs_destroy_post_read_processing();
> > > > + f2fs_destroy_post_write_processing();
> > > > f2fs_destroy_root_stats();
> > > > f2fs_exit_shrinker();
> > > > f2fs_exit_sysfs();
> > > > --
> > > > 2.50.0
Powered by blists - more mailing lists