lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aMNg-dNDsWo2BemN@google.com>
Date: Thu, 11 Sep 2025 23:53:29 +0000
From: Jaegeuk Kim <jaegeuk@...nel.org>
To: Qi Han <hanqi@...o.com>
Cc: chao@...nel.org, linux-f2fs-devel@...ts.sourceforge.net,
	linux-kernel@...r.kernel.org, axboe@...nel.dk
Subject: Re: [RFC PATCH] f2fs: f2fs support uncached buffer I/O read and write

Given the performance data and implementation overhead, I'm also questioning
whether we really need to support this for writes or not. Can we get some common
sense of usage models?

On 08/28, Qi Han wrote:
> In the link [1], we adapted uncached buffer I/O read support in f2fs.
> Now, let's move forward to enabling uncached buffer I/O write support
> in f2fs.
> 
> In f2fs_write_end_io, a separate asynchronous workqueue is created to
> perform the page drop operation for bios that contain pages of type
> FGP_DONTCACHE.
> 
> The following patch is developed and tested based on the v6.17-rc3 branch.
> My local testing results are as follows, along with some issues observed:
> 1) Write performance degradation. Uncached buffer I/O write is slower than
> normal buffered write because uncached I/O triggers a sync operation for
> each I/O after data is written to memory, in order to drop pages promptly
> at end_io. I assume this impact might be less visible on high-performance
> storage devices such as PCIe 6.0 SSDs.
> - f2fs_file_write_iter
>  - f2fs_buffered_write_iter
>  - generic_write_sync
>   - filemap_fdatawrite_range_kick
> 2) As expected, page cache usage does not significantly increase during writes.
> 3) The kswapd0 memory reclaim thread remains mostly idle, but additional
> asynchronous work overhead is introduced, e.g:
>   PID USER         PR  NI VIRT  RES  SHR S[%CPU] %MEM     TIME+ ARGS
> 19650 root          0 -20    0    0    0 I  7.0   0.0   0:00.21 [kworker/u33:3-f2fs_post_write_wq]
>    95 root          0 -20    0    0    0 I  6.6   0.0   0:02.08 [kworker/u33:0-f2fs_post_write_wq]
> 19653 root          0 -20    0    0    0 I  4.6   0.0   0:01.25 [kworker/u33:6-f2fs_post_write_wq]
> 19652 root          0 -20    0    0    0 I  4.3   0.0   0:00.92 [kworker/u33:5-f2fs_post_write_wq]
> 19613 root          0 -20    0    0    0 I  4.3   0.0   0:00.99 [kworker/u33:1-f2fs_post_write_wq]
> 19651 root          0 -20    0    0    0 I  3.6   0.0   0:00.98 [kworker/u33:4-f2fs_post_write_wq]
> 19654 root          0 -20    0    0    0 I  3.0   0.0   0:01.05 [kworker/u33:7-f2fs_post_write_wq]
> 19655 root          0 -20    0    0    0 I  2.3   0.0   0:01.18 [kworker/u33:8-f2fs_post_write_wq]
> 
> >From these results on my test device, introducing uncached buffer I/O write on
> f2fs seems to bring more drawbacks than benefits. Do we really need to support
> uncached buffer I/O write in f2fs?
> 
> Write test data without using uncached buffer I/O:
> Starting 1 threads
> pid: 17609
> writing bs 8192, uncached 0
>   1s: 753MB/sec, MB=753
>   2s: 792MB/sec, MB=1546
>   3s: 430MB/sec, MB=1978
>   4s: 661MB/sec, MB=2636
>   5s: 900MB/sec, MB=3542
>   6s: 769MB/sec, MB=4308
>   7s: 808MB/sec, MB=5113
>   8s: 766MB/sec, MB=5884
>   9s: 654MB/sec, MB=6539
>  10s: 456MB/sec, MB=6995
>  11s: 797MB/sec, MB=7793
>  12s: 770MB/sec, MB=8563
>  13s: 778MB/sec, MB=9341
>  14s: 726MB/sec, MB=10077
>  15s: 736MB/sec, MB=10803
>  16s: 798MB/sec, MB=11602
>  17s: 728MB/sec, MB=12330
>  18s: 749MB/sec, MB=13080
>  19s: 777MB/sec, MB=13857
>  20s: 688MB/sec, MB=14395
> 
> 19:29:34      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
> 19:29:35        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:29:36        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:29:37        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:29:38        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:29:39        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:29:40        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:29:41        0        94    0.00    2.00    0.00    0.00    2.00     0  kswapd0
> 19:29:42        0        94    0.00   59.00    0.00    0.00   59.00     7  kswapd0
> 19:29:43        0        94    0.00   45.00    0.00    0.00   45.00     7  kswapd0
> 19:29:44        0        94    0.00   36.00    0.00    0.00   36.00     0  kswapd0
> 19:29:45        0        94    0.00   27.00    0.00    1.00   27.00     0  kswapd0
> 19:29:46        0        94    0.00   26.00    0.00    0.00   26.00     2  kswapd0
> 19:29:47        0        94    0.00   57.00    0.00    0.00   57.00     7  kswapd0
> 19:29:48        0        94    0.00   41.00    0.00    0.00   41.00     7  kswapd0
> 19:29:49        0        94    0.00   38.00    0.00    0.00   38.00     7  kswapd0
> 19:29:50        0        94    0.00   47.00    0.00    0.00   47.00     7  kswapd0
> 19:29:51        0        94    0.00   43.00    0.00    1.00   43.00     7  kswapd0
> 19:29:52        0        94    0.00   36.00    0.00    0.00   36.00     7  kswapd0
> 19:29:53        0        94    0.00   39.00    0.00    0.00   39.00     2  kswapd0
> 19:29:54        0        94    0.00   46.00    0.00    0.00   46.00     7  kswapd0
> 19:29:55        0        94    0.00   43.00    0.00    0.00   43.00     7  kswapd0
> 19:29:56        0        94    0.00   39.00    0.00    0.00   39.00     7  kswapd0
> 19:29:57        0        94    0.00   29.00    0.00    1.00   29.00     1  kswapd0
> 19:29:58        0        94    0.00   17.00    0.00    0.00   17.00     4  kswapd0
> 
> 19:29:33    kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
> 19:29:34      4464588   6742648   4420876     38.12      6156   2032600 179730872    743.27   1863412   1822544         4
> 19:29:35      4462572   6740784   4422752     38.13      6156   2032752 179739004    743.30   1863460   1823584        16
> 19:29:36      4381512   6740856   4422420     38.13      6156   2114144 179746508    743.33   1863476   1905384     81404
> 19:29:37      3619456   6741840   4421588     38.12      6156   2877032 179746652    743.33   1863536   2668896    592584
> 19:29:38      2848184   6740720   4422472     38.13      6164   3646188 179746652    743.33   1863600   3438520    815692
> 19:29:39      2436336   6739452   4423720     38.14      6164   4056772 179746652    743.33   1863604   3849164    357096
> 19:29:40      1712660   6737700   4425140     38.15      6164   4779020 179746604    743.33   1863612   4571124    343716
> 19:29:41       810664   6738020   4425004     38.15      6164   5681152 179746604    743.33   1863612   5473444    297268
> 19:29:42       673756   6779120   4373200     37.71      5656   5869928 179746604    743.33   1902852   5589452    269032
> 19:29:43       688480   6782024   4371012     37.69      5648   5856940 179750048    743.34   1926336   5542004    279344
> 19:29:44       688956   6789028   4364260     37.63      5584   5863272 179750048    743.34   1941608   5518808    300096
> 19:29:45       740768   6804560   4348772     37.49      5524   5827248 179750000    743.34   1954084   5452844    123120
> 19:29:46       697936   6810612   4342768     37.44      5524   5876048 179750048    743.34   1962020   5483944    274908
> 19:29:47       734504   6818716   4334156     37.37      5512   5849188 179750000    743.34   1978120   5426796    274504
> 19:29:48       771696   6828316   4324180     37.28      5504   5820948 179762260    743.39   2006732   5354152    305388
> 19:29:49       691944   6838812   4313108     37.19      5476   5912444 179749952    743.34   2021720   5418996    296852
> 19:29:50       679392   6844496   4306892     37.13      5452   5931356 179749952    743.34   1982772   5463040    233600
> 19:29:51       768528   6868080   4284224     36.94      5412   5865704 176317452    729.15   1990220   5359012    343160
> 19:29:52       717880   6893940   4259968     36.73      5400   5942368 176317404    729.15   1965624   5444140    304856
> 19:29:53       712408   6902660   4251268     36.65      5372   5956584 176318376    729.15   1969192   5442132    290224
> 19:29:54       707184   6917512   4236160     36.52      5344   5976944 176318568    729.15   1968716   5443620    336948
> 19:29:55       703172   6921608   4232332     36.49      5292   5984836 176318568    729.15   1979788   5429484    328716
> 19:29:56       733256   6933020   4220864     36.39      5212   5966340 176318568    729.15   1983636   5396256    300008
> 19:29:57       723308   6936340   4217280     36.36      5120   5979816 176318568    729.15   1987088   5394360    508792
> 19:29:58       732148   6942972   4210680     36.30      5108   5977656 176311064    729.12   1990400   5379884    214936
> 
> Write test data after using uncached buffer I/O:
> Starting 1 threads
> pid: 17742
> writing bs 8192, uncached 1
>   1s: 433MB/sec, MB=433
>   2s: 195MB/sec, MB=628
>   3s: 209MB/sec, MB=836
>   4s: 54MB/sec, MB=883
>   5s: 277MB/sec, MB=1169
>   6s: 141MB/sec, MB=1311
>   7s: 185MB/sec, MB=1495
>   8s: 134MB/sec, MB=1631
>   9s: 201MB/sec, MB=1834
>  10s: 283MB/sec, MB=2114
>  11s: 223MB/sec, MB=2339
>  12s: 164MB/sec, MB=2506
>  13s: 155MB/sec, MB=2657
>  14s: 132MB/sec, MB=2792
>  15s: 186MB/sec, MB=2965
>  16s: 218MB/sec, MB=3198
>  17s: 220MB/sec, MB=3412
>  18s: 191MB/sec, MB=3606
>  19s: 214MB/sec, MB=3828
>  20s: 257MB/sec, MB=4085
> 
> 19:31:31      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
> 19:31:32        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:33        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:34        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:35        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:36        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:37        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:38        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:39        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:40        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:41        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:42        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:43        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:44        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:45        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:46        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:47        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:48        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 19:31:49        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> 
> 19:31:31    kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
> 19:31:32      4816812   6928788   4225812     36.43      5148   1879676 176322636    729.17   1920900   1336548    285748
> 19:31:33      4781880   6889428   4265592     36.78      5148   1874860 176322636    729.17   1920920   1332268    279028
> 19:31:34      4758972   6822588   4332376     37.35      5148   1830984 176322636    729.17   1920920   1288976    233040
> 19:31:35      4850248   6766480   4387840     37.83      5148   1684244 176322636    729.17   1920920   1142408     90508
> 19:31:36      4644176   6741676   4413256     38.05      5148   1864900 176322636    729.17   1920920   1323452    269380
> 19:31:37      4637900   6681480   4473436     38.57      5148   1810996 176322588    729.17   1920920   1269612    217632
> 19:31:38      4502108   6595508   4559500     39.31      5148   1860724 176322492    729.17   1920920   1319588    267760
> 19:31:39      4498844   6551068   4603928     39.69      5148   1819528 176322492    729.17   1920920   1278440    226496
> 19:31:40      4498812   6587396   4567340     39.38      5148   1856116 176322492    729.17   1920920   1314800    263292
> 19:31:41      4656784   6706252   4448372     38.35      5148   1817112 176322492    729.17   1920920   1275704    224600
> 19:31:42      4635032   6673328   4481436     38.64      5148   1805816 176322492    729.17   1920920   1264548    213436
> 19:31:43      4636852   6679736   4474884     38.58      5148   1810548 176322492    729.17   1920932   1269796    218276
> 19:31:44      4654740   6669104   4485544     38.67      5148   1782000 176322444    729.17   1920932   1241552    189880
> 19:31:45      4821604   6693156   4461848     38.47      5148   1638864 176322444    729.17   1920932   1098784     31076
> 19:31:46      4707548   6728796   4426400     38.16      5148   1788368 176322444    729.17   1920932   1248936    196596
> 19:31:47      4683996   6747632   4407348     38.00      5148   1830968 176322444    729.17   1920932   1291396    239636
> 19:31:48      4694648   6773808   4381320     37.78      5148   1846376 176322624    729.17   1920944   1307576    254800
> 19:31:49      4663784   6730212   4424776     38.15      5148   1833784 176322772    729.17   1920948   1295156    242200
> 
> [1]
> https://lore.kernel.org/lkml/20250725075310.1614614-1-hanqi@vivo.com/
> 
> Signed-off-by: Qi Han <hanqi@...o.com>
> ---
>  fs/f2fs/data.c    | 178 ++++++++++++++++++++++++++++++++++------------
>  fs/f2fs/f2fs.h    |   5 ++
>  fs/f2fs/file.c    |   2 +-
>  fs/f2fs/iostat.c  |   8 ++-
>  fs/f2fs/iostat.h  |   4 +-
>  fs/f2fs/segment.c |   2 +-
>  fs/f2fs/super.c   |  16 ++++-
>  7 files changed, 161 insertions(+), 54 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 7961e0ddfca3..4eeb2b36473d 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -30,8 +30,10 @@
>  #define NUM_PREALLOC_POST_READ_CTXS	128
>  
>  static struct kmem_cache *bio_post_read_ctx_cache;
> +static struct kmem_cache *bio_post_write_ctx_cache;
>  static struct kmem_cache *bio_entry_slab;
>  static mempool_t *bio_post_read_ctx_pool;
> +static mempool_t *bio_post_write_ctx_pool;
>  static struct bio_set f2fs_bioset;
>  
>  #define	F2FS_BIO_POOL_SIZE	NR_CURSEG_TYPE
> @@ -120,6 +122,12 @@ struct bio_post_read_ctx {
>  	block_t fs_blkaddr;
>  };
>  
> +struct bio_post_write_ctx {
> +	struct bio *bio;
> +	struct f2fs_sb_info *sbi;
> +	struct work_struct work;
> +};
> +
>  /*
>   * Update and unlock a bio's pages, and free the bio.
>   *
> @@ -159,6 +167,56 @@ static void f2fs_finish_read_bio(struct bio *bio, bool in_task)
>  	bio_put(bio);
>  }
>  
> +static void f2fs_finish_write_bio(struct f2fs_sb_info *sbi, struct bio *bio)
> +{
> +	struct folio_iter fi;
> +	struct bio_post_write_ctx *write_ctx = (struct bio_post_write_ctx *)bio->bi_private;
> +
> +	bio_for_each_folio_all(fi, bio) {
> +		struct folio *folio = fi.folio;
> +		enum count_type type;
> +
> +		if (fscrypt_is_bounce_folio(folio)) {
> +			struct folio *io_folio = folio;
> +
> +			folio = fscrypt_pagecache_folio(io_folio);
> +			fscrypt_free_bounce_page(&io_folio->page);
> +		}
> +
> +#ifdef CONFIG_F2FS_FS_COMPRESSION
> +		if (f2fs_is_compressed_page(folio)) {
> +			f2fs_compress_write_end_io(bio, folio);
> +			continue;
> +		}
> +#endif
> +
> +		type = WB_DATA_TYPE(folio, false);
> +
> +		if (unlikely(bio->bi_status != BLK_STS_OK)) {
> +			mapping_set_error(folio->mapping, -EIO);
> +			if (type == F2FS_WB_CP_DATA)
> +				f2fs_stop_checkpoint(sbi, true,
> +						STOP_CP_REASON_WRITE_FAIL);
> +		}
> +
> +		f2fs_bug_on(sbi, is_node_folio(folio) &&
> +				folio->index != nid_of_node(folio));
> +
> +		dec_page_count(sbi, type);
> +		if (f2fs_in_warm_node_list(sbi, folio))
> +			f2fs_del_fsync_node_entry(sbi, folio);
> +		folio_clear_f2fs_gcing(folio);
> +		folio_end_writeback(folio);
> +	}
> +	if (!get_pages(sbi, F2FS_WB_CP_DATA) &&
> +				wq_has_sleeper(&sbi->cp_wait))
> +		wake_up(&sbi->cp_wait);
> +
> +	if (write_ctx)
> +		mempool_free(write_ctx, bio_post_write_ctx_pool);
> +	bio_put(bio);
> +}
> +
>  static void f2fs_verify_bio(struct work_struct *work)
>  {
>  	struct bio_post_read_ctx *ctx =
> @@ -314,58 +372,32 @@ static void f2fs_read_end_io(struct bio *bio)
>  	f2fs_verify_and_finish_bio(bio, intask);
>  }
>  
> +static void f2fs_finish_write_bio_async_work(struct work_struct *work)
> +{
> +	struct bio_post_write_ctx *ctx =
> +		container_of(work, struct bio_post_write_ctx, work);
> +
> +	f2fs_finish_write_bio(ctx->sbi, ctx->bio);
> +}
> +
>  static void f2fs_write_end_io(struct bio *bio)
>  {
> -	struct f2fs_sb_info *sbi;
> -	struct folio_iter fi;
> +	struct f2fs_sb_info *sbi = F2FS_F_SB(bio_first_folio_all(bio));
> +	struct bio_post_write_ctx *write_ctx;
>  
>  	iostat_update_and_unbind_ctx(bio);
> -	sbi = bio->bi_private;
>  
>  	if (time_to_inject(sbi, FAULT_WRITE_IO))
>  		bio->bi_status = BLK_STS_IOERR;
>  
> -	bio_for_each_folio_all(fi, bio) {
> -		struct folio *folio = fi.folio;
> -		enum count_type type;
> -
> -		if (fscrypt_is_bounce_folio(folio)) {
> -			struct folio *io_folio = folio;
> -
> -			folio = fscrypt_pagecache_folio(io_folio);
> -			fscrypt_free_bounce_page(&io_folio->page);
> -		}
> -
> -#ifdef CONFIG_F2FS_FS_COMPRESSION
> -		if (f2fs_is_compressed_page(folio)) {
> -			f2fs_compress_write_end_io(bio, folio);
> -			continue;
> -		}
> -#endif
> -
> -		type = WB_DATA_TYPE(folio, false);
> -
> -		if (unlikely(bio->bi_status != BLK_STS_OK)) {
> -			mapping_set_error(folio->mapping, -EIO);
> -			if (type == F2FS_WB_CP_DATA)
> -				f2fs_stop_checkpoint(sbi, true,
> -						STOP_CP_REASON_WRITE_FAIL);
> -		}
> -
> -		f2fs_bug_on(sbi, is_node_folio(folio) &&
> -				folio->index != nid_of_node(folio));
> -
> -		dec_page_count(sbi, type);
> -		if (f2fs_in_warm_node_list(sbi, folio))
> -			f2fs_del_fsync_node_entry(sbi, folio);
> -		folio_clear_f2fs_gcing(folio);
> -		folio_end_writeback(folio);
> +	write_ctx = (struct bio_post_write_ctx *)bio->bi_private;
> +	if (write_ctx) {
> +		INIT_WORK(&write_ctx->work, f2fs_finish_write_bio_async_work);
> +		queue_work(write_ctx->sbi->post_write_wq, &write_ctx->work);
> +		return;
>  	}
> -	if (!get_pages(sbi, F2FS_WB_CP_DATA) &&
> -				wq_has_sleeper(&sbi->cp_wait))
> -		wake_up(&sbi->cp_wait);
>  
> -	bio_put(bio);
> +	f2fs_finish_write_bio(sbi, bio);
>  }
>  
>  #ifdef CONFIG_BLK_DEV_ZONED
> @@ -467,11 +499,10 @@ static struct bio *__bio_alloc(struct f2fs_io_info *fio, int npages)
>  		bio->bi_private = NULL;
>  	} else {
>  		bio->bi_end_io = f2fs_write_end_io;
> -		bio->bi_private = sbi;
> +		bio->bi_private = NULL;
>  		bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi,
>  						fio->type, fio->temp);
>  	}
> -	iostat_alloc_and_bind_ctx(sbi, bio, NULL);
>  
>  	if (fio->io_wbc)
>  		wbc_init_bio(fio->io_wbc, bio);
> @@ -701,6 +732,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
>  
>  	/* Allocate a new bio */
>  	bio = __bio_alloc(fio, 1);
> +	iostat_alloc_and_bind_ctx(fio->sbi, bio, NULL, NULL);
>  
>  	f2fs_set_bio_crypt_ctx(bio, fio_folio->mapping->host,
>  			fio_folio->index, fio, GFP_NOIO);
> @@ -899,6 +931,8 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio)
>  alloc_new:
>  	if (!bio) {
>  		bio = __bio_alloc(fio, BIO_MAX_VECS);
> +		iostat_alloc_and_bind_ctx(fio->sbi, bio, NULL, NULL);
> +
>  		f2fs_set_bio_crypt_ctx(bio, folio->mapping->host,
>  				folio->index, fio, GFP_NOIO);
>  
> @@ -948,6 +982,7 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
>  	struct f2fs_bio_info *io = sbi->write_io[btype] + fio->temp;
>  	struct folio *bio_folio;
>  	enum count_type type;
> +	struct bio_post_write_ctx *write_ctx = NULL;
>  
>  	f2fs_bug_on(sbi, is_read_io(fio->op));
>  
> @@ -1001,6 +1036,13 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
>  		f2fs_set_bio_crypt_ctx(io->bio, fio_inode(fio),
>  				bio_folio->index, fio, GFP_NOIO);
>  		io->fio = *fio;
> +
> +		if (folio_test_dropbehind(bio_folio)) {
> +			write_ctx = mempool_alloc(bio_post_write_ctx_pool, GFP_NOFS);
> +			write_ctx->bio = io->bio;
> +			write_ctx->sbi = sbi;
> +		}
> +		iostat_alloc_and_bind_ctx(fio->sbi, io->bio, NULL, write_ctx);
>  	}
>  
>  	if (!bio_add_folio(io->bio, bio_folio, folio_size(bio_folio), 0)) {
> @@ -1077,7 +1119,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
>  		ctx->decompression_attempted = false;
>  		bio->bi_private = ctx;
>  	}
> -	iostat_alloc_and_bind_ctx(sbi, bio, ctx);
> +	iostat_alloc_and_bind_ctx(sbi, bio, ctx, NULL);
>  
>  	return bio;
>  }
> @@ -3540,6 +3582,7 @@ static int f2fs_write_begin(const struct kiocb *iocb,
>  	bool use_cow = false;
>  	block_t blkaddr = NULL_ADDR;
>  	int err = 0;
> +	fgf_t fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT;
>  
>  	trace_f2fs_write_begin(inode, pos, len);
>  
> @@ -3582,12 +3625,13 @@ static int f2fs_write_begin(const struct kiocb *iocb,
>  #endif
>  
>  repeat:
> +	if (iocb && iocb->ki_flags & IOCB_DONTCACHE)
> +		fgp |= FGP_DONTCACHE;
>  	/*
>  	 * Do not use FGP_STABLE to avoid deadlock.
>  	 * Will wait that below with our IO control.
>  	 */
> -	folio = __filemap_get_folio(mapping, index,
> -				FGP_LOCK | FGP_WRITE | FGP_CREAT, GFP_NOFS);
> +	folio = __filemap_get_folio(mapping, index, fgp, GFP_NOFS);
>  	if (IS_ERR(folio)) {
>  		err = PTR_ERR(folio);
>  		goto fail;
> @@ -4127,12 +4171,38 @@ int __init f2fs_init_post_read_processing(void)
>  	return -ENOMEM;
>  }
>  
> +int __init f2fs_init_post_write_processing(void)
> +{
> +	bio_post_write_ctx_cache =
> +		kmem_cache_create("f2fs_bio_post_write_ctx",
> +				sizeof(struct bio_post_write_ctx), 0, 0, NULL);
> +	if (!bio_post_write_ctx_cache)
> +		goto fail;
> +	bio_post_write_ctx_pool =
> +		mempool_create_slab_pool(NUM_PREALLOC_POST_READ_CTXS,
> +				bio_post_write_ctx_cache);
> +	if (!bio_post_write_ctx_pool)
> +		goto fail_free_cache;
> +	return 0;
> +
> +fail_free_cache:
> +	kmem_cache_destroy(bio_post_write_ctx_cache);
> +fail:
> +	return -ENOMEM;
> +}
> +
>  void f2fs_destroy_post_read_processing(void)
>  {
>  	mempool_destroy(bio_post_read_ctx_pool);
>  	kmem_cache_destroy(bio_post_read_ctx_cache);
>  }
>  
> +void f2fs_destroy_post_write_processing(void)
> +{
> +	mempool_destroy(bio_post_write_ctx_pool);
> +	kmem_cache_destroy(bio_post_write_ctx_cache);
> +}
> +
>  int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi)
>  {
>  	if (!f2fs_sb_has_encrypt(sbi) &&
> @@ -4146,12 +4216,26 @@ int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi)
>  	return sbi->post_read_wq ? 0 : -ENOMEM;
>  }
>  
> +int f2fs_init_post_write_wq(struct f2fs_sb_info *sbi)
> +{
> +	sbi->post_write_wq = alloc_workqueue("f2fs_post_write_wq",
> +						 WQ_UNBOUND | WQ_HIGHPRI,
> +						 num_online_cpus());
> +	return sbi->post_write_wq ? 0 : -ENOMEM;
> +}
> +
>  void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi)
>  {
>  	if (sbi->post_read_wq)
>  		destroy_workqueue(sbi->post_read_wq);
>  }
>  
> +void f2fs_destroy_post_write_wq(struct f2fs_sb_info *sbi)
> +{
> +	if (sbi->post_write_wq)
> +		destroy_workqueue(sbi->post_write_wq);
> +}
> +
>  int __init f2fs_init_bio_entry_cache(void)
>  {
>  	bio_entry_slab = f2fs_kmem_cache_create("f2fs_bio_entry_slab",
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 46be7560548c..fe3f81876b23 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1812,6 +1812,7 @@ struct f2fs_sb_info {
>  	/* Precomputed FS UUID checksum for seeding other checksums */
>  	__u32 s_chksum_seed;
>  
> +	struct workqueue_struct *post_write_wq;
>  	struct workqueue_struct *post_read_wq;	/* post read workqueue */
>  
>  	/*
> @@ -4023,9 +4024,13 @@ bool f2fs_release_folio(struct folio *folio, gfp_t wait);
>  bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
>  void f2fs_clear_page_cache_dirty_tag(struct folio *folio);
>  int f2fs_init_post_read_processing(void);
> +int f2fs_init_post_write_processing(void);
>  void f2fs_destroy_post_read_processing(void);
> +void f2fs_destroy_post_write_processing(void);
>  int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi);
> +int f2fs_init_post_write_wq(struct f2fs_sb_info *sbi);
>  void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi);
> +void f2fs_destroy_post_write_wq(struct f2fs_sb_info *sbi);
>  extern const struct iomap_ops f2fs_iomap_ops;
>  
>  /*
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 42faaed6a02d..8aa6a4fd52e8 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -5443,5 +5443,5 @@ const struct file_operations f2fs_file_operations = {
>  	.splice_read	= f2fs_file_splice_read,
>  	.splice_write	= iter_file_splice_write,
>  	.fadvise	= f2fs_file_fadvise,
> -	.fop_flags	= FOP_BUFFER_RASYNC,
> +	.fop_flags	= FOP_BUFFER_RASYNC | FOP_DONTCACHE,
>  };
> diff --git a/fs/f2fs/iostat.c b/fs/f2fs/iostat.c
> index f8703038e1d8..b2e6ce80c68d 100644
> --- a/fs/f2fs/iostat.c
> +++ b/fs/f2fs/iostat.c
> @@ -245,7 +245,7 @@ void iostat_update_and_unbind_ctx(struct bio *bio)
>  	if (op_is_write(bio_op(bio))) {
>  		lat_type = bio->bi_opf & REQ_SYNC ?
>  				WRITE_SYNC_IO : WRITE_ASYNC_IO;
> -		bio->bi_private = iostat_ctx->sbi;
> +		bio->bi_private = iostat_ctx->post_write_ctx;
>  	} else {
>  		lat_type = READ_IO;
>  		bio->bi_private = iostat_ctx->post_read_ctx;
> @@ -256,7 +256,8 @@ void iostat_update_and_unbind_ctx(struct bio *bio)
>  }
>  
>  void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
> -		struct bio *bio, struct bio_post_read_ctx *ctx)
> +		struct bio *bio, struct bio_post_read_ctx *read_ctx,
> +		struct bio_post_write_ctx *write_ctx)
>  {
>  	struct bio_iostat_ctx *iostat_ctx;
>  	/* Due to the mempool, this never fails. */
> @@ -264,7 +265,8 @@ void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
>  	iostat_ctx->sbi = sbi;
>  	iostat_ctx->submit_ts = 0;
>  	iostat_ctx->type = 0;
> -	iostat_ctx->post_read_ctx = ctx;
> +	iostat_ctx->post_read_ctx = read_ctx;
> +	iostat_ctx->post_write_ctx = write_ctx;
>  	bio->bi_private = iostat_ctx;
>  }
>  
> diff --git a/fs/f2fs/iostat.h b/fs/f2fs/iostat.h
> index eb99d05cf272..a358909bb5e8 100644
> --- a/fs/f2fs/iostat.h
> +++ b/fs/f2fs/iostat.h
> @@ -40,6 +40,7 @@ struct bio_iostat_ctx {
>  	unsigned long submit_ts;
>  	enum page_type type;
>  	struct bio_post_read_ctx *post_read_ctx;
> +	struct bio_post_write_ctx *post_write_ctx;
>  };
>  
>  static inline void iostat_update_submit_ctx(struct bio *bio,
> @@ -60,7 +61,8 @@ static inline struct bio_post_read_ctx *get_post_read_ctx(struct bio *bio)
>  
>  extern void iostat_update_and_unbind_ctx(struct bio *bio);
>  extern void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
> -		struct bio *bio, struct bio_post_read_ctx *ctx);
> +		struct bio *bio, struct bio_post_read_ctx *read_ctx,
> +		struct bio_post_write_ctx *write_ctx);
>  extern int f2fs_init_iostat_processing(void);
>  extern void f2fs_destroy_iostat_processing(void);
>  extern int f2fs_init_iostat(struct f2fs_sb_info *sbi);
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index cc82d42ef14c..8501008e42b2 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -3856,7 +3856,7 @@ int f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct folio *folio,
>  		f2fs_inode_chksum_set(sbi, folio);
>  	}
>  
> -	if (fio) {
> +	if (fio && !folio_test_dropbehind(folio)) {
>  		struct f2fs_bio_info *io;
>  
>  		INIT_LIST_HEAD(&fio->list);
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index e16c4e2830c2..110dfe073aee 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -1963,6 +1963,7 @@ static void f2fs_put_super(struct super_block *sb)
>  	flush_work(&sbi->s_error_work);
>  
>  	f2fs_destroy_post_read_wq(sbi);
> +	f2fs_destroy_post_write_wq(sbi);
>  
>  	kvfree(sbi->ckpt);
>  
> @@ -4959,6 +4960,12 @@ static int f2fs_fill_super(struct super_block *sb, struct fs_context *fc)
>  		goto free_devices;
>  	}
>  
> +	err = f2fs_init_post_write_wq(sbi);
> +	if (err) {
> +		f2fs_err(sbi, "Failed to initialize post write workqueue");
> +		goto free_devices;
> +	}
> +
>  	sbi->total_valid_node_count =
>  				le32_to_cpu(sbi->ckpt->valid_node_count);
>  	percpu_counter_set(&sbi->total_valid_inode_count,
> @@ -5240,6 +5247,7 @@ static int f2fs_fill_super(struct super_block *sb, struct fs_context *fc)
>  	/* flush s_error_work before sbi destroy */
>  	flush_work(&sbi->s_error_work);
>  	f2fs_destroy_post_read_wq(sbi);
> +	f2fs_destroy_post_write_wq(sbi);
>  free_devices:
>  	destroy_device_list(sbi);
>  	kvfree(sbi->ckpt);
> @@ -5435,9 +5443,12 @@ static int __init init_f2fs_fs(void)
>  	err = f2fs_init_post_read_processing();
>  	if (err)
>  		goto free_root_stats;
> -	err = f2fs_init_iostat_processing();
> +	err = f2fs_init_post_write_processing();
>  	if (err)
>  		goto free_post_read;
> +	err = f2fs_init_iostat_processing();
> +	if (err)
> +		goto free_post_write;
>  	err = f2fs_init_bio_entry_cache();
>  	if (err)
>  		goto free_iostat;
> @@ -5469,6 +5480,8 @@ static int __init init_f2fs_fs(void)
>  	f2fs_destroy_bio_entry_cache();
>  free_iostat:
>  	f2fs_destroy_iostat_processing();
> +free_post_write:
> +	f2fs_destroy_post_write_processing();
>  free_post_read:
>  	f2fs_destroy_post_read_processing();
>  free_root_stats:
> @@ -5504,6 +5517,7 @@ static void __exit exit_f2fs_fs(void)
>  	f2fs_destroy_bio_entry_cache();
>  	f2fs_destroy_iostat_processing();
>  	f2fs_destroy_post_read_processing();
> +	f2fs_destroy_post_write_processing();
>  	f2fs_destroy_root_stats();
>  	f2fs_exit_shrinker();
>  	f2fs_exit_sysfs();
> -- 
> 2.50.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ