lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20250828121131.3694154-1-hanqi@vivo.com>
Date: Thu, 28 Aug 2025 06:11:30 -0600
From: Qi Han <hanqi@...o.com>
To: jaegeuk@...nel.org,
	chao@...nel.org
Cc: linux-f2fs-devel@...ts.sourceforge.net,
	linux-kernel@...r.kernel.org,
	axboe@...nel.dk,
	Qi Han <hanqi@...o.com>
Subject: [RFC PATCH] f2fs: f2fs support uncached buffer I/O read and write

In the link [1], we adapted uncached buffer I/O read support in f2fs.
Now, let's move forward to enabling uncached buffer I/O write support
in f2fs.

In f2fs_write_end_io, a separate asynchronous workqueue is created to
perform the page drop operation for bios that contain pages of type
FGP_DONTCACHE.

The following patch is developed and tested based on the v6.17-rc3 branch.
My local testing results are as follows, along with some issues observed:
1) Write performance degradation. Uncached buffer I/O write is slower than
normal buffered write because uncached I/O triggers a sync operation for
each I/O after data is written to memory, in order to drop pages promptly
at end_io. I assume this impact might be less visible on high-performance
storage devices such as PCIe 6.0 SSDs.
- f2fs_file_write_iter
 - f2fs_buffered_write_iter
 - generic_write_sync
  - filemap_fdatawrite_range_kick
2) As expected, page cache usage does not significantly increase during writes.
3) The kswapd0 memory reclaim thread remains mostly idle, but additional
asynchronous work overhead is introduced, e.g:
  PID USER         PR  NI VIRT  RES  SHR S[%CPU] %MEM     TIME+ ARGS
19650 root          0 -20    0    0    0 I  7.0   0.0   0:00.21 [kworker/u33:3-f2fs_post_write_wq]
   95 root          0 -20    0    0    0 I  6.6   0.0   0:02.08 [kworker/u33:0-f2fs_post_write_wq]
19653 root          0 -20    0    0    0 I  4.6   0.0   0:01.25 [kworker/u33:6-f2fs_post_write_wq]
19652 root          0 -20    0    0    0 I  4.3   0.0   0:00.92 [kworker/u33:5-f2fs_post_write_wq]
19613 root          0 -20    0    0    0 I  4.3   0.0   0:00.99 [kworker/u33:1-f2fs_post_write_wq]
19651 root          0 -20    0    0    0 I  3.6   0.0   0:00.98 [kworker/u33:4-f2fs_post_write_wq]
19654 root          0 -20    0    0    0 I  3.0   0.0   0:01.05 [kworker/u33:7-f2fs_post_write_wq]
19655 root          0 -20    0    0    0 I  2.3   0.0   0:01.18 [kworker/u33:8-f2fs_post_write_wq]

>From these results on my test device, introducing uncached buffer I/O write on
f2fs seems to bring more drawbacks than benefits. Do we really need to support
uncached buffer I/O write in f2fs?

Write test data without using uncached buffer I/O:
Starting 1 threads
pid: 17609
writing bs 8192, uncached 0
  1s: 753MB/sec, MB=753
  2s: 792MB/sec, MB=1546
  3s: 430MB/sec, MB=1978
  4s: 661MB/sec, MB=2636
  5s: 900MB/sec, MB=3542
  6s: 769MB/sec, MB=4308
  7s: 808MB/sec, MB=5113
  8s: 766MB/sec, MB=5884
  9s: 654MB/sec, MB=6539
 10s: 456MB/sec, MB=6995
 11s: 797MB/sec, MB=7793
 12s: 770MB/sec, MB=8563
 13s: 778MB/sec, MB=9341
 14s: 726MB/sec, MB=10077
 15s: 736MB/sec, MB=10803
 16s: 798MB/sec, MB=11602
 17s: 728MB/sec, MB=12330
 18s: 749MB/sec, MB=13080
 19s: 777MB/sec, MB=13857
 20s: 688MB/sec, MB=14395

19:29:34      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
19:29:35        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:29:36        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:29:37        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:29:38        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:29:39        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:29:40        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:29:41        0        94    0.00    2.00    0.00    0.00    2.00     0  kswapd0
19:29:42        0        94    0.00   59.00    0.00    0.00   59.00     7  kswapd0
19:29:43        0        94    0.00   45.00    0.00    0.00   45.00     7  kswapd0
19:29:44        0        94    0.00   36.00    0.00    0.00   36.00     0  kswapd0
19:29:45        0        94    0.00   27.00    0.00    1.00   27.00     0  kswapd0
19:29:46        0        94    0.00   26.00    0.00    0.00   26.00     2  kswapd0
19:29:47        0        94    0.00   57.00    0.00    0.00   57.00     7  kswapd0
19:29:48        0        94    0.00   41.00    0.00    0.00   41.00     7  kswapd0
19:29:49        0        94    0.00   38.00    0.00    0.00   38.00     7  kswapd0
19:29:50        0        94    0.00   47.00    0.00    0.00   47.00     7  kswapd0
19:29:51        0        94    0.00   43.00    0.00    1.00   43.00     7  kswapd0
19:29:52        0        94    0.00   36.00    0.00    0.00   36.00     7  kswapd0
19:29:53        0        94    0.00   39.00    0.00    0.00   39.00     2  kswapd0
19:29:54        0        94    0.00   46.00    0.00    0.00   46.00     7  kswapd0
19:29:55        0        94    0.00   43.00    0.00    0.00   43.00     7  kswapd0
19:29:56        0        94    0.00   39.00    0.00    0.00   39.00     7  kswapd0
19:29:57        0        94    0.00   29.00    0.00    1.00   29.00     1  kswapd0
19:29:58        0        94    0.00   17.00    0.00    0.00   17.00     4  kswapd0

19:29:33    kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
19:29:34      4464588   6742648   4420876     38.12      6156   2032600 179730872    743.27   1863412   1822544         4
19:29:35      4462572   6740784   4422752     38.13      6156   2032752 179739004    743.30   1863460   1823584        16
19:29:36      4381512   6740856   4422420     38.13      6156   2114144 179746508    743.33   1863476   1905384     81404
19:29:37      3619456   6741840   4421588     38.12      6156   2877032 179746652    743.33   1863536   2668896    592584
19:29:38      2848184   6740720   4422472     38.13      6164   3646188 179746652    743.33   1863600   3438520    815692
19:29:39      2436336   6739452   4423720     38.14      6164   4056772 179746652    743.33   1863604   3849164    357096
19:29:40      1712660   6737700   4425140     38.15      6164   4779020 179746604    743.33   1863612   4571124    343716
19:29:41       810664   6738020   4425004     38.15      6164   5681152 179746604    743.33   1863612   5473444    297268
19:29:42       673756   6779120   4373200     37.71      5656   5869928 179746604    743.33   1902852   5589452    269032
19:29:43       688480   6782024   4371012     37.69      5648   5856940 179750048    743.34   1926336   5542004    279344
19:29:44       688956   6789028   4364260     37.63      5584   5863272 179750048    743.34   1941608   5518808    300096
19:29:45       740768   6804560   4348772     37.49      5524   5827248 179750000    743.34   1954084   5452844    123120
19:29:46       697936   6810612   4342768     37.44      5524   5876048 179750048    743.34   1962020   5483944    274908
19:29:47       734504   6818716   4334156     37.37      5512   5849188 179750000    743.34   1978120   5426796    274504
19:29:48       771696   6828316   4324180     37.28      5504   5820948 179762260    743.39   2006732   5354152    305388
19:29:49       691944   6838812   4313108     37.19      5476   5912444 179749952    743.34   2021720   5418996    296852
19:29:50       679392   6844496   4306892     37.13      5452   5931356 179749952    743.34   1982772   5463040    233600
19:29:51       768528   6868080   4284224     36.94      5412   5865704 176317452    729.15   1990220   5359012    343160
19:29:52       717880   6893940   4259968     36.73      5400   5942368 176317404    729.15   1965624   5444140    304856
19:29:53       712408   6902660   4251268     36.65      5372   5956584 176318376    729.15   1969192   5442132    290224
19:29:54       707184   6917512   4236160     36.52      5344   5976944 176318568    729.15   1968716   5443620    336948
19:29:55       703172   6921608   4232332     36.49      5292   5984836 176318568    729.15   1979788   5429484    328716
19:29:56       733256   6933020   4220864     36.39      5212   5966340 176318568    729.15   1983636   5396256    300008
19:29:57       723308   6936340   4217280     36.36      5120   5979816 176318568    729.15   1987088   5394360    508792
19:29:58       732148   6942972   4210680     36.30      5108   5977656 176311064    729.12   1990400   5379884    214936

Write test data after using uncached buffer I/O:
Starting 1 threads
pid: 17742
writing bs 8192, uncached 1
  1s: 433MB/sec, MB=433
  2s: 195MB/sec, MB=628
  3s: 209MB/sec, MB=836
  4s: 54MB/sec, MB=883
  5s: 277MB/sec, MB=1169
  6s: 141MB/sec, MB=1311
  7s: 185MB/sec, MB=1495
  8s: 134MB/sec, MB=1631
  9s: 201MB/sec, MB=1834
 10s: 283MB/sec, MB=2114
 11s: 223MB/sec, MB=2339
 12s: 164MB/sec, MB=2506
 13s: 155MB/sec, MB=2657
 14s: 132MB/sec, MB=2792
 15s: 186MB/sec, MB=2965
 16s: 218MB/sec, MB=3198
 17s: 220MB/sec, MB=3412
 18s: 191MB/sec, MB=3606
 19s: 214MB/sec, MB=3828
 20s: 257MB/sec, MB=4085

19:31:31      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
19:31:32        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:33        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:34        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:35        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:36        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:37        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:38        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:39        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:40        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:41        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:42        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:43        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:44        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:45        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:46        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:47        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:48        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
19:31:49        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0

19:31:31    kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
19:31:32      4816812   6928788   4225812     36.43      5148   1879676 176322636    729.17   1920900   1336548    285748
19:31:33      4781880   6889428   4265592     36.78      5148   1874860 176322636    729.17   1920920   1332268    279028
19:31:34      4758972   6822588   4332376     37.35      5148   1830984 176322636    729.17   1920920   1288976    233040
19:31:35      4850248   6766480   4387840     37.83      5148   1684244 176322636    729.17   1920920   1142408     90508
19:31:36      4644176   6741676   4413256     38.05      5148   1864900 176322636    729.17   1920920   1323452    269380
19:31:37      4637900   6681480   4473436     38.57      5148   1810996 176322588    729.17   1920920   1269612    217632
19:31:38      4502108   6595508   4559500     39.31      5148   1860724 176322492    729.17   1920920   1319588    267760
19:31:39      4498844   6551068   4603928     39.69      5148   1819528 176322492    729.17   1920920   1278440    226496
19:31:40      4498812   6587396   4567340     39.38      5148   1856116 176322492    729.17   1920920   1314800    263292
19:31:41      4656784   6706252   4448372     38.35      5148   1817112 176322492    729.17   1920920   1275704    224600
19:31:42      4635032   6673328   4481436     38.64      5148   1805816 176322492    729.17   1920920   1264548    213436
19:31:43      4636852   6679736   4474884     38.58      5148   1810548 176322492    729.17   1920932   1269796    218276
19:31:44      4654740   6669104   4485544     38.67      5148   1782000 176322444    729.17   1920932   1241552    189880
19:31:45      4821604   6693156   4461848     38.47      5148   1638864 176322444    729.17   1920932   1098784     31076
19:31:46      4707548   6728796   4426400     38.16      5148   1788368 176322444    729.17   1920932   1248936    196596
19:31:47      4683996   6747632   4407348     38.00      5148   1830968 176322444    729.17   1920932   1291396    239636
19:31:48      4694648   6773808   4381320     37.78      5148   1846376 176322624    729.17   1920944   1307576    254800
19:31:49      4663784   6730212   4424776     38.15      5148   1833784 176322772    729.17   1920948   1295156    242200

[1]
https://lore.kernel.org/lkml/20250725075310.1614614-1-hanqi@vivo.com/

Signed-off-by: Qi Han <hanqi@...o.com>
---
 fs/f2fs/data.c    | 178 ++++++++++++++++++++++++++++++++++------------
 fs/f2fs/f2fs.h    |   5 ++
 fs/f2fs/file.c    |   2 +-
 fs/f2fs/iostat.c  |   8 ++-
 fs/f2fs/iostat.h  |   4 +-
 fs/f2fs/segment.c |   2 +-
 fs/f2fs/super.c   |  16 ++++-
 7 files changed, 161 insertions(+), 54 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 7961e0ddfca3..4eeb2b36473d 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -30,8 +30,10 @@
 #define NUM_PREALLOC_POST_READ_CTXS	128
 
 static struct kmem_cache *bio_post_read_ctx_cache;
+static struct kmem_cache *bio_post_write_ctx_cache;
 static struct kmem_cache *bio_entry_slab;
 static mempool_t *bio_post_read_ctx_pool;
+static mempool_t *bio_post_write_ctx_pool;
 static struct bio_set f2fs_bioset;
 
 #define	F2FS_BIO_POOL_SIZE	NR_CURSEG_TYPE
@@ -120,6 +122,12 @@ struct bio_post_read_ctx {
 	block_t fs_blkaddr;
 };
 
+struct bio_post_write_ctx {
+	struct bio *bio;
+	struct f2fs_sb_info *sbi;
+	struct work_struct work;
+};
+
 /*
  * Update and unlock a bio's pages, and free the bio.
  *
@@ -159,6 +167,56 @@ static void f2fs_finish_read_bio(struct bio *bio, bool in_task)
 	bio_put(bio);
 }
 
+static void f2fs_finish_write_bio(struct f2fs_sb_info *sbi, struct bio *bio)
+{
+	struct folio_iter fi;
+	struct bio_post_write_ctx *write_ctx = (struct bio_post_write_ctx *)bio->bi_private;
+
+	bio_for_each_folio_all(fi, bio) {
+		struct folio *folio = fi.folio;
+		enum count_type type;
+
+		if (fscrypt_is_bounce_folio(folio)) {
+			struct folio *io_folio = folio;
+
+			folio = fscrypt_pagecache_folio(io_folio);
+			fscrypt_free_bounce_page(&io_folio->page);
+		}
+
+#ifdef CONFIG_F2FS_FS_COMPRESSION
+		if (f2fs_is_compressed_page(folio)) {
+			f2fs_compress_write_end_io(bio, folio);
+			continue;
+		}
+#endif
+
+		type = WB_DATA_TYPE(folio, false);
+
+		if (unlikely(bio->bi_status != BLK_STS_OK)) {
+			mapping_set_error(folio->mapping, -EIO);
+			if (type == F2FS_WB_CP_DATA)
+				f2fs_stop_checkpoint(sbi, true,
+						STOP_CP_REASON_WRITE_FAIL);
+		}
+
+		f2fs_bug_on(sbi, is_node_folio(folio) &&
+				folio->index != nid_of_node(folio));
+
+		dec_page_count(sbi, type);
+		if (f2fs_in_warm_node_list(sbi, folio))
+			f2fs_del_fsync_node_entry(sbi, folio);
+		folio_clear_f2fs_gcing(folio);
+		folio_end_writeback(folio);
+	}
+	if (!get_pages(sbi, F2FS_WB_CP_DATA) &&
+				wq_has_sleeper(&sbi->cp_wait))
+		wake_up(&sbi->cp_wait);
+
+	if (write_ctx)
+		mempool_free(write_ctx, bio_post_write_ctx_pool);
+	bio_put(bio);
+}
+
 static void f2fs_verify_bio(struct work_struct *work)
 {
 	struct bio_post_read_ctx *ctx =
@@ -314,58 +372,32 @@ static void f2fs_read_end_io(struct bio *bio)
 	f2fs_verify_and_finish_bio(bio, intask);
 }
 
+static void f2fs_finish_write_bio_async_work(struct work_struct *work)
+{
+	struct bio_post_write_ctx *ctx =
+		container_of(work, struct bio_post_write_ctx, work);
+
+	f2fs_finish_write_bio(ctx->sbi, ctx->bio);
+}
+
 static void f2fs_write_end_io(struct bio *bio)
 {
-	struct f2fs_sb_info *sbi;
-	struct folio_iter fi;
+	struct f2fs_sb_info *sbi = F2FS_F_SB(bio_first_folio_all(bio));
+	struct bio_post_write_ctx *write_ctx;
 
 	iostat_update_and_unbind_ctx(bio);
-	sbi = bio->bi_private;
 
 	if (time_to_inject(sbi, FAULT_WRITE_IO))
 		bio->bi_status = BLK_STS_IOERR;
 
-	bio_for_each_folio_all(fi, bio) {
-		struct folio *folio = fi.folio;
-		enum count_type type;
-
-		if (fscrypt_is_bounce_folio(folio)) {
-			struct folio *io_folio = folio;
-
-			folio = fscrypt_pagecache_folio(io_folio);
-			fscrypt_free_bounce_page(&io_folio->page);
-		}
-
-#ifdef CONFIG_F2FS_FS_COMPRESSION
-		if (f2fs_is_compressed_page(folio)) {
-			f2fs_compress_write_end_io(bio, folio);
-			continue;
-		}
-#endif
-
-		type = WB_DATA_TYPE(folio, false);
-
-		if (unlikely(bio->bi_status != BLK_STS_OK)) {
-			mapping_set_error(folio->mapping, -EIO);
-			if (type == F2FS_WB_CP_DATA)
-				f2fs_stop_checkpoint(sbi, true,
-						STOP_CP_REASON_WRITE_FAIL);
-		}
-
-		f2fs_bug_on(sbi, is_node_folio(folio) &&
-				folio->index != nid_of_node(folio));
-
-		dec_page_count(sbi, type);
-		if (f2fs_in_warm_node_list(sbi, folio))
-			f2fs_del_fsync_node_entry(sbi, folio);
-		folio_clear_f2fs_gcing(folio);
-		folio_end_writeback(folio);
+	write_ctx = (struct bio_post_write_ctx *)bio->bi_private;
+	if (write_ctx) {
+		INIT_WORK(&write_ctx->work, f2fs_finish_write_bio_async_work);
+		queue_work(write_ctx->sbi->post_write_wq, &write_ctx->work);
+		return;
 	}
-	if (!get_pages(sbi, F2FS_WB_CP_DATA) &&
-				wq_has_sleeper(&sbi->cp_wait))
-		wake_up(&sbi->cp_wait);
 
-	bio_put(bio);
+	f2fs_finish_write_bio(sbi, bio);
 }
 
 #ifdef CONFIG_BLK_DEV_ZONED
@@ -467,11 +499,10 @@ static struct bio *__bio_alloc(struct f2fs_io_info *fio, int npages)
 		bio->bi_private = NULL;
 	} else {
 		bio->bi_end_io = f2fs_write_end_io;
-		bio->bi_private = sbi;
+		bio->bi_private = NULL;
 		bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi,
 						fio->type, fio->temp);
 	}
-	iostat_alloc_and_bind_ctx(sbi, bio, NULL);
 
 	if (fio->io_wbc)
 		wbc_init_bio(fio->io_wbc, bio);
@@ -701,6 +732,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
 
 	/* Allocate a new bio */
 	bio = __bio_alloc(fio, 1);
+	iostat_alloc_and_bind_ctx(fio->sbi, bio, NULL, NULL);
 
 	f2fs_set_bio_crypt_ctx(bio, fio_folio->mapping->host,
 			fio_folio->index, fio, GFP_NOIO);
@@ -899,6 +931,8 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio)
 alloc_new:
 	if (!bio) {
 		bio = __bio_alloc(fio, BIO_MAX_VECS);
+		iostat_alloc_and_bind_ctx(fio->sbi, bio, NULL, NULL);
+
 		f2fs_set_bio_crypt_ctx(bio, folio->mapping->host,
 				folio->index, fio, GFP_NOIO);
 
@@ -948,6 +982,7 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
 	struct f2fs_bio_info *io = sbi->write_io[btype] + fio->temp;
 	struct folio *bio_folio;
 	enum count_type type;
+	struct bio_post_write_ctx *write_ctx = NULL;
 
 	f2fs_bug_on(sbi, is_read_io(fio->op));
 
@@ -1001,6 +1036,13 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
 		f2fs_set_bio_crypt_ctx(io->bio, fio_inode(fio),
 				bio_folio->index, fio, GFP_NOIO);
 		io->fio = *fio;
+
+		if (folio_test_dropbehind(bio_folio)) {
+			write_ctx = mempool_alloc(bio_post_write_ctx_pool, GFP_NOFS);
+			write_ctx->bio = io->bio;
+			write_ctx->sbi = sbi;
+		}
+		iostat_alloc_and_bind_ctx(fio->sbi, io->bio, NULL, write_ctx);
 	}
 
 	if (!bio_add_folio(io->bio, bio_folio, folio_size(bio_folio), 0)) {
@@ -1077,7 +1119,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
 		ctx->decompression_attempted = false;
 		bio->bi_private = ctx;
 	}
-	iostat_alloc_and_bind_ctx(sbi, bio, ctx);
+	iostat_alloc_and_bind_ctx(sbi, bio, ctx, NULL);
 
 	return bio;
 }
@@ -3540,6 +3582,7 @@ static int f2fs_write_begin(const struct kiocb *iocb,
 	bool use_cow = false;
 	block_t blkaddr = NULL_ADDR;
 	int err = 0;
+	fgf_t fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT;
 
 	trace_f2fs_write_begin(inode, pos, len);
 
@@ -3582,12 +3625,13 @@ static int f2fs_write_begin(const struct kiocb *iocb,
 #endif
 
 repeat:
+	if (iocb && iocb->ki_flags & IOCB_DONTCACHE)
+		fgp |= FGP_DONTCACHE;
 	/*
 	 * Do not use FGP_STABLE to avoid deadlock.
 	 * Will wait that below with our IO control.
 	 */
-	folio = __filemap_get_folio(mapping, index,
-				FGP_LOCK | FGP_WRITE | FGP_CREAT, GFP_NOFS);
+	folio = __filemap_get_folio(mapping, index, fgp, GFP_NOFS);
 	if (IS_ERR(folio)) {
 		err = PTR_ERR(folio);
 		goto fail;
@@ -4127,12 +4171,38 @@ int __init f2fs_init_post_read_processing(void)
 	return -ENOMEM;
 }
 
+int __init f2fs_init_post_write_processing(void)
+{
+	bio_post_write_ctx_cache =
+		kmem_cache_create("f2fs_bio_post_write_ctx",
+				sizeof(struct bio_post_write_ctx), 0, 0, NULL);
+	if (!bio_post_write_ctx_cache)
+		goto fail;
+	bio_post_write_ctx_pool =
+		mempool_create_slab_pool(NUM_PREALLOC_POST_READ_CTXS,
+				bio_post_write_ctx_cache);
+	if (!bio_post_write_ctx_pool)
+		goto fail_free_cache;
+	return 0;
+
+fail_free_cache:
+	kmem_cache_destroy(bio_post_write_ctx_cache);
+fail:
+	return -ENOMEM;
+}
+
 void f2fs_destroy_post_read_processing(void)
 {
 	mempool_destroy(bio_post_read_ctx_pool);
 	kmem_cache_destroy(bio_post_read_ctx_cache);
 }
 
+void f2fs_destroy_post_write_processing(void)
+{
+	mempool_destroy(bio_post_write_ctx_pool);
+	kmem_cache_destroy(bio_post_write_ctx_cache);
+}
+
 int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi)
 {
 	if (!f2fs_sb_has_encrypt(sbi) &&
@@ -4146,12 +4216,26 @@ int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi)
 	return sbi->post_read_wq ? 0 : -ENOMEM;
 }
 
+int f2fs_init_post_write_wq(struct f2fs_sb_info *sbi)
+{
+	sbi->post_write_wq = alloc_workqueue("f2fs_post_write_wq",
+						 WQ_UNBOUND | WQ_HIGHPRI,
+						 num_online_cpus());
+	return sbi->post_write_wq ? 0 : -ENOMEM;
+}
+
 void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi)
 {
 	if (sbi->post_read_wq)
 		destroy_workqueue(sbi->post_read_wq);
 }
 
+void f2fs_destroy_post_write_wq(struct f2fs_sb_info *sbi)
+{
+	if (sbi->post_write_wq)
+		destroy_workqueue(sbi->post_write_wq);
+}
+
 int __init f2fs_init_bio_entry_cache(void)
 {
 	bio_entry_slab = f2fs_kmem_cache_create("f2fs_bio_entry_slab",
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 46be7560548c..fe3f81876b23 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1812,6 +1812,7 @@ struct f2fs_sb_info {
 	/* Precomputed FS UUID checksum for seeding other checksums */
 	__u32 s_chksum_seed;
 
+	struct workqueue_struct *post_write_wq;
 	struct workqueue_struct *post_read_wq;	/* post read workqueue */
 
 	/*
@@ -4023,9 +4024,13 @@ bool f2fs_release_folio(struct folio *folio, gfp_t wait);
 bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
 void f2fs_clear_page_cache_dirty_tag(struct folio *folio);
 int f2fs_init_post_read_processing(void);
+int f2fs_init_post_write_processing(void);
 void f2fs_destroy_post_read_processing(void);
+void f2fs_destroy_post_write_processing(void);
 int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi);
+int f2fs_init_post_write_wq(struct f2fs_sb_info *sbi);
 void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi);
+void f2fs_destroy_post_write_wq(struct f2fs_sb_info *sbi);
 extern const struct iomap_ops f2fs_iomap_ops;
 
 /*
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 42faaed6a02d..8aa6a4fd52e8 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -5443,5 +5443,5 @@ const struct file_operations f2fs_file_operations = {
 	.splice_read	= f2fs_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.fadvise	= f2fs_file_fadvise,
-	.fop_flags	= FOP_BUFFER_RASYNC,
+	.fop_flags	= FOP_BUFFER_RASYNC | FOP_DONTCACHE,
 };
diff --git a/fs/f2fs/iostat.c b/fs/f2fs/iostat.c
index f8703038e1d8..b2e6ce80c68d 100644
--- a/fs/f2fs/iostat.c
+++ b/fs/f2fs/iostat.c
@@ -245,7 +245,7 @@ void iostat_update_and_unbind_ctx(struct bio *bio)
 	if (op_is_write(bio_op(bio))) {
 		lat_type = bio->bi_opf & REQ_SYNC ?
 				WRITE_SYNC_IO : WRITE_ASYNC_IO;
-		bio->bi_private = iostat_ctx->sbi;
+		bio->bi_private = iostat_ctx->post_write_ctx;
 	} else {
 		lat_type = READ_IO;
 		bio->bi_private = iostat_ctx->post_read_ctx;
@@ -256,7 +256,8 @@ void iostat_update_and_unbind_ctx(struct bio *bio)
 }
 
 void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
-		struct bio *bio, struct bio_post_read_ctx *ctx)
+		struct bio *bio, struct bio_post_read_ctx *read_ctx,
+		struct bio_post_write_ctx *write_ctx)
 {
 	struct bio_iostat_ctx *iostat_ctx;
 	/* Due to the mempool, this never fails. */
@@ -264,7 +265,8 @@ void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
 	iostat_ctx->sbi = sbi;
 	iostat_ctx->submit_ts = 0;
 	iostat_ctx->type = 0;
-	iostat_ctx->post_read_ctx = ctx;
+	iostat_ctx->post_read_ctx = read_ctx;
+	iostat_ctx->post_write_ctx = write_ctx;
 	bio->bi_private = iostat_ctx;
 }
 
diff --git a/fs/f2fs/iostat.h b/fs/f2fs/iostat.h
index eb99d05cf272..a358909bb5e8 100644
--- a/fs/f2fs/iostat.h
+++ b/fs/f2fs/iostat.h
@@ -40,6 +40,7 @@ struct bio_iostat_ctx {
 	unsigned long submit_ts;
 	enum page_type type;
 	struct bio_post_read_ctx *post_read_ctx;
+	struct bio_post_write_ctx *post_write_ctx;
 };
 
 static inline void iostat_update_submit_ctx(struct bio *bio,
@@ -60,7 +61,8 @@ static inline struct bio_post_read_ctx *get_post_read_ctx(struct bio *bio)
 
 extern void iostat_update_and_unbind_ctx(struct bio *bio);
 extern void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
-		struct bio *bio, struct bio_post_read_ctx *ctx);
+		struct bio *bio, struct bio_post_read_ctx *read_ctx,
+		struct bio_post_write_ctx *write_ctx);
 extern int f2fs_init_iostat_processing(void);
 extern void f2fs_destroy_iostat_processing(void);
 extern int f2fs_init_iostat(struct f2fs_sb_info *sbi);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index cc82d42ef14c..8501008e42b2 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3856,7 +3856,7 @@ int f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct folio *folio,
 		f2fs_inode_chksum_set(sbi, folio);
 	}
 
-	if (fio) {
+	if (fio && !folio_test_dropbehind(folio)) {
 		struct f2fs_bio_info *io;
 
 		INIT_LIST_HEAD(&fio->list);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index e16c4e2830c2..110dfe073aee 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1963,6 +1963,7 @@ static void f2fs_put_super(struct super_block *sb)
 	flush_work(&sbi->s_error_work);
 
 	f2fs_destroy_post_read_wq(sbi);
+	f2fs_destroy_post_write_wq(sbi);
 
 	kvfree(sbi->ckpt);
 
@@ -4959,6 +4960,12 @@ static int f2fs_fill_super(struct super_block *sb, struct fs_context *fc)
 		goto free_devices;
 	}
 
+	err = f2fs_init_post_write_wq(sbi);
+	if (err) {
+		f2fs_err(sbi, "Failed to initialize post write workqueue");
+		goto free_devices;
+	}
+
 	sbi->total_valid_node_count =
 				le32_to_cpu(sbi->ckpt->valid_node_count);
 	percpu_counter_set(&sbi->total_valid_inode_count,
@@ -5240,6 +5247,7 @@ static int f2fs_fill_super(struct super_block *sb, struct fs_context *fc)
 	/* flush s_error_work before sbi destroy */
 	flush_work(&sbi->s_error_work);
 	f2fs_destroy_post_read_wq(sbi);
+	f2fs_destroy_post_write_wq(sbi);
 free_devices:
 	destroy_device_list(sbi);
 	kvfree(sbi->ckpt);
@@ -5435,9 +5443,12 @@ static int __init init_f2fs_fs(void)
 	err = f2fs_init_post_read_processing();
 	if (err)
 		goto free_root_stats;
-	err = f2fs_init_iostat_processing();
+	err = f2fs_init_post_write_processing();
 	if (err)
 		goto free_post_read;
+	err = f2fs_init_iostat_processing();
+	if (err)
+		goto free_post_write;
 	err = f2fs_init_bio_entry_cache();
 	if (err)
 		goto free_iostat;
@@ -5469,6 +5480,8 @@ static int __init init_f2fs_fs(void)
 	f2fs_destroy_bio_entry_cache();
 free_iostat:
 	f2fs_destroy_iostat_processing();
+free_post_write:
+	f2fs_destroy_post_write_processing();
 free_post_read:
 	f2fs_destroy_post_read_processing();
 free_root_stats:
@@ -5504,6 +5517,7 @@ static void __exit exit_f2fs_fs(void)
 	f2fs_destroy_bio_entry_cache();
 	f2fs_destroy_iostat_processing();
 	f2fs_destroy_post_read_processing();
+	f2fs_destroy_post_write_processing();
 	f2fs_destroy_root_stats();
 	f2fs_exit_shrinker();
 	f2fs_exit_sysfs();
-- 
2.50.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ