[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aMNg-dNDsWo2BemN@google.com>
Date: Thu, 11 Sep 2025 23:53:29 +0000
From: Jaegeuk Kim <jaegeuk@...nel.org>
To: Qi Han <hanqi@...o.com>
Cc: chao@...nel.org, linux-f2fs-devel@...ts.sourceforge.net,
linux-kernel@...r.kernel.org, axboe@...nel.dk
Subject: Re: [RFC PATCH] f2fs: f2fs support uncached buffer I/O read and write
Given the performance data and implementation overhead, I'm also questioning
whether we really need to support this for writes or not. Can we get some common
sense of usage models?
On 08/28, Qi Han wrote:
> In the link [1], we adapted uncached buffer I/O read support in f2fs.
> Now, let's move forward to enabling uncached buffer I/O write support
> in f2fs.
>
> In f2fs_write_end_io, a separate asynchronous workqueue is created to
> perform the page drop operation for bios that contain pages of type
> FGP_DONTCACHE.
>
> The following patch is developed and tested based on the v6.17-rc3 branch.
> My local testing results are as follows, along with some issues observed:
> 1) Write performance degradation. Uncached buffer I/O write is slower than
> normal buffered write because uncached I/O triggers a sync operation for
> each I/O after data is written to memory, in order to drop pages promptly
> at end_io. I assume this impact might be less visible on high-performance
> storage devices such as PCIe 6.0 SSDs.
> - f2fs_file_write_iter
> - f2fs_buffered_write_iter
> - generic_write_sync
> - filemap_fdatawrite_range_kick
> 2) As expected, page cache usage does not significantly increase during writes.
> 3) The kswapd0 memory reclaim thread remains mostly idle, but additional
> asynchronous work overhead is introduced, e.g:
> PID USER PR NI VIRT RES SHR S[%CPU] %MEM TIME+ ARGS
> 19650 root 0 -20 0 0 0 I 7.0 0.0 0:00.21 [kworker/u33:3-f2fs_post_write_wq]
> 95 root 0 -20 0 0 0 I 6.6 0.0 0:02.08 [kworker/u33:0-f2fs_post_write_wq]
> 19653 root 0 -20 0 0 0 I 4.6 0.0 0:01.25 [kworker/u33:6-f2fs_post_write_wq]
> 19652 root 0 -20 0 0 0 I 4.3 0.0 0:00.92 [kworker/u33:5-f2fs_post_write_wq]
> 19613 root 0 -20 0 0 0 I 4.3 0.0 0:00.99 [kworker/u33:1-f2fs_post_write_wq]
> 19651 root 0 -20 0 0 0 I 3.6 0.0 0:00.98 [kworker/u33:4-f2fs_post_write_wq]
> 19654 root 0 -20 0 0 0 I 3.0 0.0 0:01.05 [kworker/u33:7-f2fs_post_write_wq]
> 19655 root 0 -20 0 0 0 I 2.3 0.0 0:01.18 [kworker/u33:8-f2fs_post_write_wq]
>
> >From these results on my test device, introducing uncached buffer I/O write on
> f2fs seems to bring more drawbacks than benefits. Do we really need to support
> uncached buffer I/O write in f2fs?
>
> Write test data without using uncached buffer I/O:
> Starting 1 threads
> pid: 17609
> writing bs 8192, uncached 0
> 1s: 753MB/sec, MB=753
> 2s: 792MB/sec, MB=1546
> 3s: 430MB/sec, MB=1978
> 4s: 661MB/sec, MB=2636
> 5s: 900MB/sec, MB=3542
> 6s: 769MB/sec, MB=4308
> 7s: 808MB/sec, MB=5113
> 8s: 766MB/sec, MB=5884
> 9s: 654MB/sec, MB=6539
> 10s: 456MB/sec, MB=6995
> 11s: 797MB/sec, MB=7793
> 12s: 770MB/sec, MB=8563
> 13s: 778MB/sec, MB=9341
> 14s: 726MB/sec, MB=10077
> 15s: 736MB/sec, MB=10803
> 16s: 798MB/sec, MB=11602
> 17s: 728MB/sec, MB=12330
> 18s: 749MB/sec, MB=13080
> 19s: 777MB/sec, MB=13857
> 20s: 688MB/sec, MB=14395
>
> 19:29:34 UID PID %usr %system %guest %wait %CPU CPU Command
> 19:29:35 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:29:36 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:29:37 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:29:38 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:29:39 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:29:40 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:29:41 0 94 0.00 2.00 0.00 0.00 2.00 0 kswapd0
> 19:29:42 0 94 0.00 59.00 0.00 0.00 59.00 7 kswapd0
> 19:29:43 0 94 0.00 45.00 0.00 0.00 45.00 7 kswapd0
> 19:29:44 0 94 0.00 36.00 0.00 0.00 36.00 0 kswapd0
> 19:29:45 0 94 0.00 27.00 0.00 1.00 27.00 0 kswapd0
> 19:29:46 0 94 0.00 26.00 0.00 0.00 26.00 2 kswapd0
> 19:29:47 0 94 0.00 57.00 0.00 0.00 57.00 7 kswapd0
> 19:29:48 0 94 0.00 41.00 0.00 0.00 41.00 7 kswapd0
> 19:29:49 0 94 0.00 38.00 0.00 0.00 38.00 7 kswapd0
> 19:29:50 0 94 0.00 47.00 0.00 0.00 47.00 7 kswapd0
> 19:29:51 0 94 0.00 43.00 0.00 1.00 43.00 7 kswapd0
> 19:29:52 0 94 0.00 36.00 0.00 0.00 36.00 7 kswapd0
> 19:29:53 0 94 0.00 39.00 0.00 0.00 39.00 2 kswapd0
> 19:29:54 0 94 0.00 46.00 0.00 0.00 46.00 7 kswapd0
> 19:29:55 0 94 0.00 43.00 0.00 0.00 43.00 7 kswapd0
> 19:29:56 0 94 0.00 39.00 0.00 0.00 39.00 7 kswapd0
> 19:29:57 0 94 0.00 29.00 0.00 1.00 29.00 1 kswapd0
> 19:29:58 0 94 0.00 17.00 0.00 0.00 17.00 4 kswapd0
>
> 19:29:33 kbmemfree kbavail kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
> 19:29:34 4464588 6742648 4420876 38.12 6156 2032600 179730872 743.27 1863412 1822544 4
> 19:29:35 4462572 6740784 4422752 38.13 6156 2032752 179739004 743.30 1863460 1823584 16
> 19:29:36 4381512 6740856 4422420 38.13 6156 2114144 179746508 743.33 1863476 1905384 81404
> 19:29:37 3619456 6741840 4421588 38.12 6156 2877032 179746652 743.33 1863536 2668896 592584
> 19:29:38 2848184 6740720 4422472 38.13 6164 3646188 179746652 743.33 1863600 3438520 815692
> 19:29:39 2436336 6739452 4423720 38.14 6164 4056772 179746652 743.33 1863604 3849164 357096
> 19:29:40 1712660 6737700 4425140 38.15 6164 4779020 179746604 743.33 1863612 4571124 343716
> 19:29:41 810664 6738020 4425004 38.15 6164 5681152 179746604 743.33 1863612 5473444 297268
> 19:29:42 673756 6779120 4373200 37.71 5656 5869928 179746604 743.33 1902852 5589452 269032
> 19:29:43 688480 6782024 4371012 37.69 5648 5856940 179750048 743.34 1926336 5542004 279344
> 19:29:44 688956 6789028 4364260 37.63 5584 5863272 179750048 743.34 1941608 5518808 300096
> 19:29:45 740768 6804560 4348772 37.49 5524 5827248 179750000 743.34 1954084 5452844 123120
> 19:29:46 697936 6810612 4342768 37.44 5524 5876048 179750048 743.34 1962020 5483944 274908
> 19:29:47 734504 6818716 4334156 37.37 5512 5849188 179750000 743.34 1978120 5426796 274504
> 19:29:48 771696 6828316 4324180 37.28 5504 5820948 179762260 743.39 2006732 5354152 305388
> 19:29:49 691944 6838812 4313108 37.19 5476 5912444 179749952 743.34 2021720 5418996 296852
> 19:29:50 679392 6844496 4306892 37.13 5452 5931356 179749952 743.34 1982772 5463040 233600
> 19:29:51 768528 6868080 4284224 36.94 5412 5865704 176317452 729.15 1990220 5359012 343160
> 19:29:52 717880 6893940 4259968 36.73 5400 5942368 176317404 729.15 1965624 5444140 304856
> 19:29:53 712408 6902660 4251268 36.65 5372 5956584 176318376 729.15 1969192 5442132 290224
> 19:29:54 707184 6917512 4236160 36.52 5344 5976944 176318568 729.15 1968716 5443620 336948
> 19:29:55 703172 6921608 4232332 36.49 5292 5984836 176318568 729.15 1979788 5429484 328716
> 19:29:56 733256 6933020 4220864 36.39 5212 5966340 176318568 729.15 1983636 5396256 300008
> 19:29:57 723308 6936340 4217280 36.36 5120 5979816 176318568 729.15 1987088 5394360 508792
> 19:29:58 732148 6942972 4210680 36.30 5108 5977656 176311064 729.12 1990400 5379884 214936
>
> Write test data after using uncached buffer I/O:
> Starting 1 threads
> pid: 17742
> writing bs 8192, uncached 1
> 1s: 433MB/sec, MB=433
> 2s: 195MB/sec, MB=628
> 3s: 209MB/sec, MB=836
> 4s: 54MB/sec, MB=883
> 5s: 277MB/sec, MB=1169
> 6s: 141MB/sec, MB=1311
> 7s: 185MB/sec, MB=1495
> 8s: 134MB/sec, MB=1631
> 9s: 201MB/sec, MB=1834
> 10s: 283MB/sec, MB=2114
> 11s: 223MB/sec, MB=2339
> 12s: 164MB/sec, MB=2506
> 13s: 155MB/sec, MB=2657
> 14s: 132MB/sec, MB=2792
> 15s: 186MB/sec, MB=2965
> 16s: 218MB/sec, MB=3198
> 17s: 220MB/sec, MB=3412
> 18s: 191MB/sec, MB=3606
> 19s: 214MB/sec, MB=3828
> 20s: 257MB/sec, MB=4085
>
> 19:31:31 UID PID %usr %system %guest %wait %CPU CPU Command
> 19:31:32 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:33 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:34 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:35 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:36 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:37 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:38 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:39 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:40 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:41 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:42 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:43 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:44 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:45 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:46 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:47 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:48 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
> 19:31:49 0 94 0.00 0.00 0.00 0.00 0.00 4 kswapd0
>
> 19:31:31 kbmemfree kbavail kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
> 19:31:32 4816812 6928788 4225812 36.43 5148 1879676 176322636 729.17 1920900 1336548 285748
> 19:31:33 4781880 6889428 4265592 36.78 5148 1874860 176322636 729.17 1920920 1332268 279028
> 19:31:34 4758972 6822588 4332376 37.35 5148 1830984 176322636 729.17 1920920 1288976 233040
> 19:31:35 4850248 6766480 4387840 37.83 5148 1684244 176322636 729.17 1920920 1142408 90508
> 19:31:36 4644176 6741676 4413256 38.05 5148 1864900 176322636 729.17 1920920 1323452 269380
> 19:31:37 4637900 6681480 4473436 38.57 5148 1810996 176322588 729.17 1920920 1269612 217632
> 19:31:38 4502108 6595508 4559500 39.31 5148 1860724 176322492 729.17 1920920 1319588 267760
> 19:31:39 4498844 6551068 4603928 39.69 5148 1819528 176322492 729.17 1920920 1278440 226496
> 19:31:40 4498812 6587396 4567340 39.38 5148 1856116 176322492 729.17 1920920 1314800 263292
> 19:31:41 4656784 6706252 4448372 38.35 5148 1817112 176322492 729.17 1920920 1275704 224600
> 19:31:42 4635032 6673328 4481436 38.64 5148 1805816 176322492 729.17 1920920 1264548 213436
> 19:31:43 4636852 6679736 4474884 38.58 5148 1810548 176322492 729.17 1920932 1269796 218276
> 19:31:44 4654740 6669104 4485544 38.67 5148 1782000 176322444 729.17 1920932 1241552 189880
> 19:31:45 4821604 6693156 4461848 38.47 5148 1638864 176322444 729.17 1920932 1098784 31076
> 19:31:46 4707548 6728796 4426400 38.16 5148 1788368 176322444 729.17 1920932 1248936 196596
> 19:31:47 4683996 6747632 4407348 38.00 5148 1830968 176322444 729.17 1920932 1291396 239636
> 19:31:48 4694648 6773808 4381320 37.78 5148 1846376 176322624 729.17 1920944 1307576 254800
> 19:31:49 4663784 6730212 4424776 38.15 5148 1833784 176322772 729.17 1920948 1295156 242200
>
> [1]
> https://lore.kernel.org/lkml/20250725075310.1614614-1-hanqi@vivo.com/
>
> Signed-off-by: Qi Han <hanqi@...o.com>
> ---
> fs/f2fs/data.c | 178 ++++++++++++++++++++++++++++++++++------------
> fs/f2fs/f2fs.h | 5 ++
> fs/f2fs/file.c | 2 +-
> fs/f2fs/iostat.c | 8 ++-
> fs/f2fs/iostat.h | 4 +-
> fs/f2fs/segment.c | 2 +-
> fs/f2fs/super.c | 16 ++++-
> 7 files changed, 161 insertions(+), 54 deletions(-)
>
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 7961e0ddfca3..4eeb2b36473d 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -30,8 +30,10 @@
> #define NUM_PREALLOC_POST_READ_CTXS 128
>
> static struct kmem_cache *bio_post_read_ctx_cache;
> +static struct kmem_cache *bio_post_write_ctx_cache;
> static struct kmem_cache *bio_entry_slab;
> static mempool_t *bio_post_read_ctx_pool;
> +static mempool_t *bio_post_write_ctx_pool;
> static struct bio_set f2fs_bioset;
>
> #define F2FS_BIO_POOL_SIZE NR_CURSEG_TYPE
> @@ -120,6 +122,12 @@ struct bio_post_read_ctx {
> block_t fs_blkaddr;
> };
>
> +struct bio_post_write_ctx {
> + struct bio *bio;
> + struct f2fs_sb_info *sbi;
> + struct work_struct work;
> +};
> +
> /*
> * Update and unlock a bio's pages, and free the bio.
> *
> @@ -159,6 +167,56 @@ static void f2fs_finish_read_bio(struct bio *bio, bool in_task)
> bio_put(bio);
> }
>
> +static void f2fs_finish_write_bio(struct f2fs_sb_info *sbi, struct bio *bio)
> +{
> + struct folio_iter fi;
> + struct bio_post_write_ctx *write_ctx = (struct bio_post_write_ctx *)bio->bi_private;
> +
> + bio_for_each_folio_all(fi, bio) {
> + struct folio *folio = fi.folio;
> + enum count_type type;
> +
> + if (fscrypt_is_bounce_folio(folio)) {
> + struct folio *io_folio = folio;
> +
> + folio = fscrypt_pagecache_folio(io_folio);
> + fscrypt_free_bounce_page(&io_folio->page);
> + }
> +
> +#ifdef CONFIG_F2FS_FS_COMPRESSION
> + if (f2fs_is_compressed_page(folio)) {
> + f2fs_compress_write_end_io(bio, folio);
> + continue;
> + }
> +#endif
> +
> + type = WB_DATA_TYPE(folio, false);
> +
> + if (unlikely(bio->bi_status != BLK_STS_OK)) {
> + mapping_set_error(folio->mapping, -EIO);
> + if (type == F2FS_WB_CP_DATA)
> + f2fs_stop_checkpoint(sbi, true,
> + STOP_CP_REASON_WRITE_FAIL);
> + }
> +
> + f2fs_bug_on(sbi, is_node_folio(folio) &&
> + folio->index != nid_of_node(folio));
> +
> + dec_page_count(sbi, type);
> + if (f2fs_in_warm_node_list(sbi, folio))
> + f2fs_del_fsync_node_entry(sbi, folio);
> + folio_clear_f2fs_gcing(folio);
> + folio_end_writeback(folio);
> + }
> + if (!get_pages(sbi, F2FS_WB_CP_DATA) &&
> + wq_has_sleeper(&sbi->cp_wait))
> + wake_up(&sbi->cp_wait);
> +
> + if (write_ctx)
> + mempool_free(write_ctx, bio_post_write_ctx_pool);
> + bio_put(bio);
> +}
> +
> static void f2fs_verify_bio(struct work_struct *work)
> {
> struct bio_post_read_ctx *ctx =
> @@ -314,58 +372,32 @@ static void f2fs_read_end_io(struct bio *bio)
> f2fs_verify_and_finish_bio(bio, intask);
> }
>
> +static void f2fs_finish_write_bio_async_work(struct work_struct *work)
> +{
> + struct bio_post_write_ctx *ctx =
> + container_of(work, struct bio_post_write_ctx, work);
> +
> + f2fs_finish_write_bio(ctx->sbi, ctx->bio);
> +}
> +
> static void f2fs_write_end_io(struct bio *bio)
> {
> - struct f2fs_sb_info *sbi;
> - struct folio_iter fi;
> + struct f2fs_sb_info *sbi = F2FS_F_SB(bio_first_folio_all(bio));
> + struct bio_post_write_ctx *write_ctx;
>
> iostat_update_and_unbind_ctx(bio);
> - sbi = bio->bi_private;
>
> if (time_to_inject(sbi, FAULT_WRITE_IO))
> bio->bi_status = BLK_STS_IOERR;
>
> - bio_for_each_folio_all(fi, bio) {
> - struct folio *folio = fi.folio;
> - enum count_type type;
> -
> - if (fscrypt_is_bounce_folio(folio)) {
> - struct folio *io_folio = folio;
> -
> - folio = fscrypt_pagecache_folio(io_folio);
> - fscrypt_free_bounce_page(&io_folio->page);
> - }
> -
> -#ifdef CONFIG_F2FS_FS_COMPRESSION
> - if (f2fs_is_compressed_page(folio)) {
> - f2fs_compress_write_end_io(bio, folio);
> - continue;
> - }
> -#endif
> -
> - type = WB_DATA_TYPE(folio, false);
> -
> - if (unlikely(bio->bi_status != BLK_STS_OK)) {
> - mapping_set_error(folio->mapping, -EIO);
> - if (type == F2FS_WB_CP_DATA)
> - f2fs_stop_checkpoint(sbi, true,
> - STOP_CP_REASON_WRITE_FAIL);
> - }
> -
> - f2fs_bug_on(sbi, is_node_folio(folio) &&
> - folio->index != nid_of_node(folio));
> -
> - dec_page_count(sbi, type);
> - if (f2fs_in_warm_node_list(sbi, folio))
> - f2fs_del_fsync_node_entry(sbi, folio);
> - folio_clear_f2fs_gcing(folio);
> - folio_end_writeback(folio);
> + write_ctx = (struct bio_post_write_ctx *)bio->bi_private;
> + if (write_ctx) {
> + INIT_WORK(&write_ctx->work, f2fs_finish_write_bio_async_work);
> + queue_work(write_ctx->sbi->post_write_wq, &write_ctx->work);
> + return;
> }
> - if (!get_pages(sbi, F2FS_WB_CP_DATA) &&
> - wq_has_sleeper(&sbi->cp_wait))
> - wake_up(&sbi->cp_wait);
>
> - bio_put(bio);
> + f2fs_finish_write_bio(sbi, bio);
> }
>
> #ifdef CONFIG_BLK_DEV_ZONED
> @@ -467,11 +499,10 @@ static struct bio *__bio_alloc(struct f2fs_io_info *fio, int npages)
> bio->bi_private = NULL;
> } else {
> bio->bi_end_io = f2fs_write_end_io;
> - bio->bi_private = sbi;
> + bio->bi_private = NULL;
> bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi,
> fio->type, fio->temp);
> }
> - iostat_alloc_and_bind_ctx(sbi, bio, NULL);
>
> if (fio->io_wbc)
> wbc_init_bio(fio->io_wbc, bio);
> @@ -701,6 +732,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
>
> /* Allocate a new bio */
> bio = __bio_alloc(fio, 1);
> + iostat_alloc_and_bind_ctx(fio->sbi, bio, NULL, NULL);
>
> f2fs_set_bio_crypt_ctx(bio, fio_folio->mapping->host,
> fio_folio->index, fio, GFP_NOIO);
> @@ -899,6 +931,8 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio)
> alloc_new:
> if (!bio) {
> bio = __bio_alloc(fio, BIO_MAX_VECS);
> + iostat_alloc_and_bind_ctx(fio->sbi, bio, NULL, NULL);
> +
> f2fs_set_bio_crypt_ctx(bio, folio->mapping->host,
> folio->index, fio, GFP_NOIO);
>
> @@ -948,6 +982,7 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
> struct f2fs_bio_info *io = sbi->write_io[btype] + fio->temp;
> struct folio *bio_folio;
> enum count_type type;
> + struct bio_post_write_ctx *write_ctx = NULL;
>
> f2fs_bug_on(sbi, is_read_io(fio->op));
>
> @@ -1001,6 +1036,13 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
> f2fs_set_bio_crypt_ctx(io->bio, fio_inode(fio),
> bio_folio->index, fio, GFP_NOIO);
> io->fio = *fio;
> +
> + if (folio_test_dropbehind(bio_folio)) {
> + write_ctx = mempool_alloc(bio_post_write_ctx_pool, GFP_NOFS);
> + write_ctx->bio = io->bio;
> + write_ctx->sbi = sbi;
> + }
> + iostat_alloc_and_bind_ctx(fio->sbi, io->bio, NULL, write_ctx);
> }
>
> if (!bio_add_folio(io->bio, bio_folio, folio_size(bio_folio), 0)) {
> @@ -1077,7 +1119,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
> ctx->decompression_attempted = false;
> bio->bi_private = ctx;
> }
> - iostat_alloc_and_bind_ctx(sbi, bio, ctx);
> + iostat_alloc_and_bind_ctx(sbi, bio, ctx, NULL);
>
> return bio;
> }
> @@ -3540,6 +3582,7 @@ static int f2fs_write_begin(const struct kiocb *iocb,
> bool use_cow = false;
> block_t blkaddr = NULL_ADDR;
> int err = 0;
> + fgf_t fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT;
>
> trace_f2fs_write_begin(inode, pos, len);
>
> @@ -3582,12 +3625,13 @@ static int f2fs_write_begin(const struct kiocb *iocb,
> #endif
>
> repeat:
> + if (iocb && iocb->ki_flags & IOCB_DONTCACHE)
> + fgp |= FGP_DONTCACHE;
> /*
> * Do not use FGP_STABLE to avoid deadlock.
> * Will wait that below with our IO control.
> */
> - folio = __filemap_get_folio(mapping, index,
> - FGP_LOCK | FGP_WRITE | FGP_CREAT, GFP_NOFS);
> + folio = __filemap_get_folio(mapping, index, fgp, GFP_NOFS);
> if (IS_ERR(folio)) {
> err = PTR_ERR(folio);
> goto fail;
> @@ -4127,12 +4171,38 @@ int __init f2fs_init_post_read_processing(void)
> return -ENOMEM;
> }
>
> +int __init f2fs_init_post_write_processing(void)
> +{
> + bio_post_write_ctx_cache =
> + kmem_cache_create("f2fs_bio_post_write_ctx",
> + sizeof(struct bio_post_write_ctx), 0, 0, NULL);
> + if (!bio_post_write_ctx_cache)
> + goto fail;
> + bio_post_write_ctx_pool =
> + mempool_create_slab_pool(NUM_PREALLOC_POST_READ_CTXS,
> + bio_post_write_ctx_cache);
> + if (!bio_post_write_ctx_pool)
> + goto fail_free_cache;
> + return 0;
> +
> +fail_free_cache:
> + kmem_cache_destroy(bio_post_write_ctx_cache);
> +fail:
> + return -ENOMEM;
> +}
> +
> void f2fs_destroy_post_read_processing(void)
> {
> mempool_destroy(bio_post_read_ctx_pool);
> kmem_cache_destroy(bio_post_read_ctx_cache);
> }
>
> +void f2fs_destroy_post_write_processing(void)
> +{
> + mempool_destroy(bio_post_write_ctx_pool);
> + kmem_cache_destroy(bio_post_write_ctx_cache);
> +}
> +
> int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi)
> {
> if (!f2fs_sb_has_encrypt(sbi) &&
> @@ -4146,12 +4216,26 @@ int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi)
> return sbi->post_read_wq ? 0 : -ENOMEM;
> }
>
> +int f2fs_init_post_write_wq(struct f2fs_sb_info *sbi)
> +{
> + sbi->post_write_wq = alloc_workqueue("f2fs_post_write_wq",
> + WQ_UNBOUND | WQ_HIGHPRI,
> + num_online_cpus());
> + return sbi->post_write_wq ? 0 : -ENOMEM;
> +}
> +
> void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi)
> {
> if (sbi->post_read_wq)
> destroy_workqueue(sbi->post_read_wq);
> }
>
> +void f2fs_destroy_post_write_wq(struct f2fs_sb_info *sbi)
> +{
> + if (sbi->post_write_wq)
> + destroy_workqueue(sbi->post_write_wq);
> +}
> +
> int __init f2fs_init_bio_entry_cache(void)
> {
> bio_entry_slab = f2fs_kmem_cache_create("f2fs_bio_entry_slab",
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 46be7560548c..fe3f81876b23 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1812,6 +1812,7 @@ struct f2fs_sb_info {
> /* Precomputed FS UUID checksum for seeding other checksums */
> __u32 s_chksum_seed;
>
> + struct workqueue_struct *post_write_wq;
> struct workqueue_struct *post_read_wq; /* post read workqueue */
>
> /*
> @@ -4023,9 +4024,13 @@ bool f2fs_release_folio(struct folio *folio, gfp_t wait);
> bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
> void f2fs_clear_page_cache_dirty_tag(struct folio *folio);
> int f2fs_init_post_read_processing(void);
> +int f2fs_init_post_write_processing(void);
> void f2fs_destroy_post_read_processing(void);
> +void f2fs_destroy_post_write_processing(void);
> int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi);
> +int f2fs_init_post_write_wq(struct f2fs_sb_info *sbi);
> void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi);
> +void f2fs_destroy_post_write_wq(struct f2fs_sb_info *sbi);
> extern const struct iomap_ops f2fs_iomap_ops;
>
> /*
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 42faaed6a02d..8aa6a4fd52e8 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -5443,5 +5443,5 @@ const struct file_operations f2fs_file_operations = {
> .splice_read = f2fs_file_splice_read,
> .splice_write = iter_file_splice_write,
> .fadvise = f2fs_file_fadvise,
> - .fop_flags = FOP_BUFFER_RASYNC,
> + .fop_flags = FOP_BUFFER_RASYNC | FOP_DONTCACHE,
> };
> diff --git a/fs/f2fs/iostat.c b/fs/f2fs/iostat.c
> index f8703038e1d8..b2e6ce80c68d 100644
> --- a/fs/f2fs/iostat.c
> +++ b/fs/f2fs/iostat.c
> @@ -245,7 +245,7 @@ void iostat_update_and_unbind_ctx(struct bio *bio)
> if (op_is_write(bio_op(bio))) {
> lat_type = bio->bi_opf & REQ_SYNC ?
> WRITE_SYNC_IO : WRITE_ASYNC_IO;
> - bio->bi_private = iostat_ctx->sbi;
> + bio->bi_private = iostat_ctx->post_write_ctx;
> } else {
> lat_type = READ_IO;
> bio->bi_private = iostat_ctx->post_read_ctx;
> @@ -256,7 +256,8 @@ void iostat_update_and_unbind_ctx(struct bio *bio)
> }
>
> void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
> - struct bio *bio, struct bio_post_read_ctx *ctx)
> + struct bio *bio, struct bio_post_read_ctx *read_ctx,
> + struct bio_post_write_ctx *write_ctx)
> {
> struct bio_iostat_ctx *iostat_ctx;
> /* Due to the mempool, this never fails. */
> @@ -264,7 +265,8 @@ void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
> iostat_ctx->sbi = sbi;
> iostat_ctx->submit_ts = 0;
> iostat_ctx->type = 0;
> - iostat_ctx->post_read_ctx = ctx;
> + iostat_ctx->post_read_ctx = read_ctx;
> + iostat_ctx->post_write_ctx = write_ctx;
> bio->bi_private = iostat_ctx;
> }
>
> diff --git a/fs/f2fs/iostat.h b/fs/f2fs/iostat.h
> index eb99d05cf272..a358909bb5e8 100644
> --- a/fs/f2fs/iostat.h
> +++ b/fs/f2fs/iostat.h
> @@ -40,6 +40,7 @@ struct bio_iostat_ctx {
> unsigned long submit_ts;
> enum page_type type;
> struct bio_post_read_ctx *post_read_ctx;
> + struct bio_post_write_ctx *post_write_ctx;
> };
>
> static inline void iostat_update_submit_ctx(struct bio *bio,
> @@ -60,7 +61,8 @@ static inline struct bio_post_read_ctx *get_post_read_ctx(struct bio *bio)
>
> extern void iostat_update_and_unbind_ctx(struct bio *bio);
> extern void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
> - struct bio *bio, struct bio_post_read_ctx *ctx);
> + struct bio *bio, struct bio_post_read_ctx *read_ctx,
> + struct bio_post_write_ctx *write_ctx);
> extern int f2fs_init_iostat_processing(void);
> extern void f2fs_destroy_iostat_processing(void);
> extern int f2fs_init_iostat(struct f2fs_sb_info *sbi);
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index cc82d42ef14c..8501008e42b2 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -3856,7 +3856,7 @@ int f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct folio *folio,
> f2fs_inode_chksum_set(sbi, folio);
> }
>
> - if (fio) {
> + if (fio && !folio_test_dropbehind(folio)) {
> struct f2fs_bio_info *io;
>
> INIT_LIST_HEAD(&fio->list);
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index e16c4e2830c2..110dfe073aee 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -1963,6 +1963,7 @@ static void f2fs_put_super(struct super_block *sb)
> flush_work(&sbi->s_error_work);
>
> f2fs_destroy_post_read_wq(sbi);
> + f2fs_destroy_post_write_wq(sbi);
>
> kvfree(sbi->ckpt);
>
> @@ -4959,6 +4960,12 @@ static int f2fs_fill_super(struct super_block *sb, struct fs_context *fc)
> goto free_devices;
> }
>
> + err = f2fs_init_post_write_wq(sbi);
> + if (err) {
> + f2fs_err(sbi, "Failed to initialize post write workqueue");
> + goto free_devices;
> + }
> +
> sbi->total_valid_node_count =
> le32_to_cpu(sbi->ckpt->valid_node_count);
> percpu_counter_set(&sbi->total_valid_inode_count,
> @@ -5240,6 +5247,7 @@ static int f2fs_fill_super(struct super_block *sb, struct fs_context *fc)
> /* flush s_error_work before sbi destroy */
> flush_work(&sbi->s_error_work);
> f2fs_destroy_post_read_wq(sbi);
> + f2fs_destroy_post_write_wq(sbi);
> free_devices:
> destroy_device_list(sbi);
> kvfree(sbi->ckpt);
> @@ -5435,9 +5443,12 @@ static int __init init_f2fs_fs(void)
> err = f2fs_init_post_read_processing();
> if (err)
> goto free_root_stats;
> - err = f2fs_init_iostat_processing();
> + err = f2fs_init_post_write_processing();
> if (err)
> goto free_post_read;
> + err = f2fs_init_iostat_processing();
> + if (err)
> + goto free_post_write;
> err = f2fs_init_bio_entry_cache();
> if (err)
> goto free_iostat;
> @@ -5469,6 +5480,8 @@ static int __init init_f2fs_fs(void)
> f2fs_destroy_bio_entry_cache();
> free_iostat:
> f2fs_destroy_iostat_processing();
> +free_post_write:
> + f2fs_destroy_post_write_processing();
> free_post_read:
> f2fs_destroy_post_read_processing();
> free_root_stats:
> @@ -5504,6 +5517,7 @@ static void __exit exit_f2fs_fs(void)
> f2fs_destroy_bio_entry_cache();
> f2fs_destroy_iostat_processing();
> f2fs_destroy_post_read_processing();
> + f2fs_destroy_post_write_processing();
> f2fs_destroy_root_stats();
> f2fs_exit_shrinker();
> f2fs_exit_sysfs();
> --
> 2.50.0
Powered by blists - more mailing lists