lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aMmdCd_0acZHtqUN@google.com>
Date: Tue, 16 Sep 2025 17:23:21 +0000
From: Jaegeuk Kim <jaegeuk@...nel.org>
To: hanqi <hanqi@...o.com>
Cc: Chao Yu <chao@...nel.org>, axboe@...nel.dk,
	linux-f2fs-devel@...ts.sourceforge.net,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] f2fs: f2fs support uncached buffer I/O read and write

On 09/16, hanqi wrote:
> Yes, for most storage devices, since disk performance is much lower
> than memory, writing with F2FS uncached buffer IO does not bring
> significant performance benefits. Its advantages might only become
> apparent in scenarios where disk performance exceeds that of memory.
> 
> Therefore, I also agree that F2FS should first support uncached buffer
> IO for reads, as Chao mentioned.

Thanks, could you please post a patch and get some comments back?

> 
> Thanks,
> 
> On 2025/9/16 11:13, Chao Yu wrote:
> > On 9/12/25 07:53, Jaegeuk Kim wrote:
> > > Given the performance data and implementation overhead, I'm also questioning
> > > whether we really need to support this for writes or not. Can we get some common
> > > sense of usage models?
> > Seems uncached write implementation affects the performance a lot, I don't see
> > a good reason to merge this for now.
> > 
> > I think we can try to enable uncached read functionality and return -EOPNOTSUPP
> > for uncached write first, meanwhile, let's see if there is anything good usecase
> > for uncached write.
> > 
> > Thanks,
> > 
> > > On 08/28, Qi Han wrote:
> > > > In the link [1], we adapted uncached buffer I/O read support in f2fs.
> > > > Now, let's move forward to enabling uncached buffer I/O write support
> > > > in f2fs.
> > > > 
> > > > In f2fs_write_end_io, a separate asynchronous workqueue is created to
> > > > perform the page drop operation for bios that contain pages of type
> > > > FGP_DONTCACHE.
> > > > 
> > > > The following patch is developed and tested based on the v6.17-rc3 branch.
> > > > My local testing results are as follows, along with some issues observed:
> > > > 1) Write performance degradation. Uncached buffer I/O write is slower than
> > > > normal buffered write because uncached I/O triggers a sync operation for
> > > > each I/O after data is written to memory, in order to drop pages promptly
> > > > at end_io. I assume this impact might be less visible on high-performance
> > > > storage devices such as PCIe 6.0 SSDs.
> > > > - f2fs_file_write_iter
> > > >   - f2fs_buffered_write_iter
> > > >   - generic_write_sync
> > > >    - filemap_fdatawrite_range_kick
> > > > 2) As expected, page cache usage does not significantly increase during writes.
> > > > 3) The kswapd0 memory reclaim thread remains mostly idle, but additional
> > > > asynchronous work overhead is introduced, e.g:
> > > >    PID USER         PR  NI VIRT  RES  SHR S[%CPU] %MEM     TIME+ ARGS
> > > > 19650 root          0 -20    0    0    0 I  7.0   0.0   0:00.21 [kworker/u33:3-f2fs_post_write_wq]
> > > >     95 root          0 -20    0    0    0 I  6.6   0.0   0:02.08 [kworker/u33:0-f2fs_post_write_wq]
> > > > 19653 root          0 -20    0    0    0 I  4.6   0.0   0:01.25 [kworker/u33:6-f2fs_post_write_wq]
> > > > 19652 root          0 -20    0    0    0 I  4.3   0.0   0:00.92 [kworker/u33:5-f2fs_post_write_wq]
> > > > 19613 root          0 -20    0    0    0 I  4.3   0.0   0:00.99 [kworker/u33:1-f2fs_post_write_wq]
> > > > 19651 root          0 -20    0    0    0 I  3.6   0.0   0:00.98 [kworker/u33:4-f2fs_post_write_wq]
> > > > 19654 root          0 -20    0    0    0 I  3.0   0.0   0:01.05 [kworker/u33:7-f2fs_post_write_wq]
> > > > 19655 root          0 -20    0    0    0 I  2.3   0.0   0:01.18 [kworker/u33:8-f2fs_post_write_wq]
> > > > 
> > > > >From these results on my test device, introducing uncached buffer I/O write on
> > > > f2fs seems to bring more drawbacks than benefits. Do we really need to support
> > > > uncached buffer I/O write in f2fs?
> > > > 
> > > > Write test data without using uncached buffer I/O:
> > > > Starting 1 threads
> > > > pid: 17609
> > > > writing bs 8192, uncached 0
> > > >    1s: 753MB/sec, MB=753
> > > >    2s: 792MB/sec, MB=1546
> > > >    3s: 430MB/sec, MB=1978
> > > >    4s: 661MB/sec, MB=2636
> > > >    5s: 900MB/sec, MB=3542
> > > >    6s: 769MB/sec, MB=4308
> > > >    7s: 808MB/sec, MB=5113
> > > >    8s: 766MB/sec, MB=5884
> > > >    9s: 654MB/sec, MB=6539
> > > >   10s: 456MB/sec, MB=6995
> > > >   11s: 797MB/sec, MB=7793
> > > >   12s: 770MB/sec, MB=8563
> > > >   13s: 778MB/sec, MB=9341
> > > >   14s: 726MB/sec, MB=10077
> > > >   15s: 736MB/sec, MB=10803
> > > >   16s: 798MB/sec, MB=11602
> > > >   17s: 728MB/sec, MB=12330
> > > >   18s: 749MB/sec, MB=13080
> > > >   19s: 777MB/sec, MB=13857
> > > >   20s: 688MB/sec, MB=14395
> > > > 
> > > > 19:29:34      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
> > > > 19:29:35        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:29:36        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:29:37        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:29:38        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:29:39        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:29:40        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:29:41        0        94    0.00    2.00    0.00    0.00    2.00     0  kswapd0
> > > > 19:29:42        0        94    0.00   59.00    0.00    0.00   59.00     7  kswapd0
> > > > 19:29:43        0        94    0.00   45.00    0.00    0.00   45.00     7  kswapd0
> > > > 19:29:44        0        94    0.00   36.00    0.00    0.00   36.00     0  kswapd0
> > > > 19:29:45        0        94    0.00   27.00    0.00    1.00   27.00     0  kswapd0
> > > > 19:29:46        0        94    0.00   26.00    0.00    0.00   26.00     2  kswapd0
> > > > 19:29:47        0        94    0.00   57.00    0.00    0.00   57.00     7  kswapd0
> > > > 19:29:48        0        94    0.00   41.00    0.00    0.00   41.00     7  kswapd0
> > > > 19:29:49        0        94    0.00   38.00    0.00    0.00   38.00     7  kswapd0
> > > > 19:29:50        0        94    0.00   47.00    0.00    0.00   47.00     7  kswapd0
> > > > 19:29:51        0        94    0.00   43.00    0.00    1.00   43.00     7  kswapd0
> > > > 19:29:52        0        94    0.00   36.00    0.00    0.00   36.00     7  kswapd0
> > > > 19:29:53        0        94    0.00   39.00    0.00    0.00   39.00     2  kswapd0
> > > > 19:29:54        0        94    0.00   46.00    0.00    0.00   46.00     7  kswapd0
> > > > 19:29:55        0        94    0.00   43.00    0.00    0.00   43.00     7  kswapd0
> > > > 19:29:56        0        94    0.00   39.00    0.00    0.00   39.00     7  kswapd0
> > > > 19:29:57        0        94    0.00   29.00    0.00    1.00   29.00     1  kswapd0
> > > > 19:29:58        0        94    0.00   17.00    0.00    0.00   17.00     4  kswapd0
> > > > 
> > > > 19:29:33    kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
> > > > 19:29:34      4464588   6742648   4420876     38.12      6156   2032600 179730872    743.27   1863412   1822544         4
> > > > 19:29:35      4462572   6740784   4422752     38.13      6156   2032752 179739004    743.30   1863460   1823584        16
> > > > 19:29:36      4381512   6740856   4422420     38.13      6156   2114144 179746508    743.33   1863476   1905384     81404
> > > > 19:29:37      3619456   6741840   4421588     38.12      6156   2877032 179746652    743.33   1863536   2668896    592584
> > > > 19:29:38      2848184   6740720   4422472     38.13      6164   3646188 179746652    743.33   1863600   3438520    815692
> > > > 19:29:39      2436336   6739452   4423720     38.14      6164   4056772 179746652    743.33   1863604   3849164    357096
> > > > 19:29:40      1712660   6737700   4425140     38.15      6164   4779020 179746604    743.33   1863612   4571124    343716
> > > > 19:29:41       810664   6738020   4425004     38.15      6164   5681152 179746604    743.33   1863612   5473444    297268
> > > > 19:29:42       673756   6779120   4373200     37.71      5656   5869928 179746604    743.33   1902852   5589452    269032
> > > > 19:29:43       688480   6782024   4371012     37.69      5648   5856940 179750048    743.34   1926336   5542004    279344
> > > > 19:29:44       688956   6789028   4364260     37.63      5584   5863272 179750048    743.34   1941608   5518808    300096
> > > > 19:29:45       740768   6804560   4348772     37.49      5524   5827248 179750000    743.34   1954084   5452844    123120
> > > > 19:29:46       697936   6810612   4342768     37.44      5524   5876048 179750048    743.34   1962020   5483944    274908
> > > > 19:29:47       734504   6818716   4334156     37.37      5512   5849188 179750000    743.34   1978120   5426796    274504
> > > > 19:29:48       771696   6828316   4324180     37.28      5504   5820948 179762260    743.39   2006732   5354152    305388
> > > > 19:29:49       691944   6838812   4313108     37.19      5476   5912444 179749952    743.34   2021720   5418996    296852
> > > > 19:29:50       679392   6844496   4306892     37.13      5452   5931356 179749952    743.34   1982772   5463040    233600
> > > > 19:29:51       768528   6868080   4284224     36.94      5412   5865704 176317452    729.15   1990220   5359012    343160
> > > > 19:29:52       717880   6893940   4259968     36.73      5400   5942368 176317404    729.15   1965624   5444140    304856
> > > > 19:29:53       712408   6902660   4251268     36.65      5372   5956584 176318376    729.15   1969192   5442132    290224
> > > > 19:29:54       707184   6917512   4236160     36.52      5344   5976944 176318568    729.15   1968716   5443620    336948
> > > > 19:29:55       703172   6921608   4232332     36.49      5292   5984836 176318568    729.15   1979788   5429484    328716
> > > > 19:29:56       733256   6933020   4220864     36.39      5212   5966340 176318568    729.15   1983636   5396256    300008
> > > > 19:29:57       723308   6936340   4217280     36.36      5120   5979816 176318568    729.15   1987088   5394360    508792
> > > > 19:29:58       732148   6942972   4210680     36.30      5108   5977656 176311064    729.12   1990400   5379884    214936
> > > > 
> > > > Write test data after using uncached buffer I/O:
> > > > Starting 1 threads
> > > > pid: 17742
> > > > writing bs 8192, uncached 1
> > > >    1s: 433MB/sec, MB=433
> > > >    2s: 195MB/sec, MB=628
> > > >    3s: 209MB/sec, MB=836
> > > >    4s: 54MB/sec, MB=883
> > > >    5s: 277MB/sec, MB=1169
> > > >    6s: 141MB/sec, MB=1311
> > > >    7s: 185MB/sec, MB=1495
> > > >    8s: 134MB/sec, MB=1631
> > > >    9s: 201MB/sec, MB=1834
> > > >   10s: 283MB/sec, MB=2114
> > > >   11s: 223MB/sec, MB=2339
> > > >   12s: 164MB/sec, MB=2506
> > > >   13s: 155MB/sec, MB=2657
> > > >   14s: 132MB/sec, MB=2792
> > > >   15s: 186MB/sec, MB=2965
> > > >   16s: 218MB/sec, MB=3198
> > > >   17s: 220MB/sec, MB=3412
> > > >   18s: 191MB/sec, MB=3606
> > > >   19s: 214MB/sec, MB=3828
> > > >   20s: 257MB/sec, MB=4085
> > > > 
> > > > 19:31:31      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
> > > > 19:31:32        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:33        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:34        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:35        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:36        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:37        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:38        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:39        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:40        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:41        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:42        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:43        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:44        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:45        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:46        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:47        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:48        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 19:31:49        0        94    0.00    0.00    0.00    0.00    0.00     4  kswapd0
> > > > 
> > > > 19:31:31    kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
> > > > 19:31:32      4816812   6928788   4225812     36.43      5148   1879676 176322636    729.17   1920900   1336548    285748
> > > > 19:31:33      4781880   6889428   4265592     36.78      5148   1874860 176322636    729.17   1920920   1332268    279028
> > > > 19:31:34      4758972   6822588   4332376     37.35      5148   1830984 176322636    729.17   1920920   1288976    233040
> > > > 19:31:35      4850248   6766480   4387840     37.83      5148   1684244 176322636    729.17   1920920   1142408     90508
> > > > 19:31:36      4644176   6741676   4413256     38.05      5148   1864900 176322636    729.17   1920920   1323452    269380
> > > > 19:31:37      4637900   6681480   4473436     38.57      5148   1810996 176322588    729.17   1920920   1269612    217632
> > > > 19:31:38      4502108   6595508   4559500     39.31      5148   1860724 176322492    729.17   1920920   1319588    267760
> > > > 19:31:39      4498844   6551068   4603928     39.69      5148   1819528 176322492    729.17   1920920   1278440    226496
> > > > 19:31:40      4498812   6587396   4567340     39.38      5148   1856116 176322492    729.17   1920920   1314800    263292
> > > > 19:31:41      4656784   6706252   4448372     38.35      5148   1817112 176322492    729.17   1920920   1275704    224600
> > > > 19:31:42      4635032   6673328   4481436     38.64      5148   1805816 176322492    729.17   1920920   1264548    213436
> > > > 19:31:43      4636852   6679736   4474884     38.58      5148   1810548 176322492    729.17   1920932   1269796    218276
> > > > 19:31:44      4654740   6669104   4485544     38.67      5148   1782000 176322444    729.17   1920932   1241552    189880
> > > > 19:31:45      4821604   6693156   4461848     38.47      5148   1638864 176322444    729.17   1920932   1098784     31076
> > > > 19:31:46      4707548   6728796   4426400     38.16      5148   1788368 176322444    729.17   1920932   1248936    196596
> > > > 19:31:47      4683996   6747632   4407348     38.00      5148   1830968 176322444    729.17   1920932   1291396    239636
> > > > 19:31:48      4694648   6773808   4381320     37.78      5148   1846376 176322624    729.17   1920944   1307576    254800
> > > > 19:31:49      4663784   6730212   4424776     38.15      5148   1833784 176322772    729.17   1920948   1295156    242200
> > > > 
> > > > [1]
> > > > https://lore.kernel.org/lkml/20250725075310.1614614-1-hanqi@vivo.com/
> > > > 
> > > > Signed-off-by: Qi Han <hanqi@...o.com>
> > > > ---
> > > >   fs/f2fs/data.c    | 178 ++++++++++++++++++++++++++++++++++------------
> > > >   fs/f2fs/f2fs.h    |   5 ++
> > > >   fs/f2fs/file.c    |   2 +-
> > > >   fs/f2fs/iostat.c  |   8 ++-
> > > >   fs/f2fs/iostat.h  |   4 +-
> > > >   fs/f2fs/segment.c |   2 +-
> > > >   fs/f2fs/super.c   |  16 ++++-
> > > >   7 files changed, 161 insertions(+), 54 deletions(-)
> > > > 
> > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > > index 7961e0ddfca3..4eeb2b36473d 100644
> > > > --- a/fs/f2fs/data.c
> > > > +++ b/fs/f2fs/data.c
> > > > @@ -30,8 +30,10 @@
> > > >   #define NUM_PREALLOC_POST_READ_CTXS	128
> > > >   static struct kmem_cache *bio_post_read_ctx_cache;
> > > > +static struct kmem_cache *bio_post_write_ctx_cache;
> > > >   static struct kmem_cache *bio_entry_slab;
> > > >   static mempool_t *bio_post_read_ctx_pool;
> > > > +static mempool_t *bio_post_write_ctx_pool;
> > > >   static struct bio_set f2fs_bioset;
> > > >   #define	F2FS_BIO_POOL_SIZE	NR_CURSEG_TYPE
> > > > @@ -120,6 +122,12 @@ struct bio_post_read_ctx {
> > > >   	block_t fs_blkaddr;
> > > >   };
> > > > +struct bio_post_write_ctx {
> > > > +	struct bio *bio;
> > > > +	struct f2fs_sb_info *sbi;
> > > > +	struct work_struct work;
> > > > +};
> > > > +
> > > >   /*
> > > >    * Update and unlock a bio's pages, and free the bio.
> > > >    *
> > > > @@ -159,6 +167,56 @@ static void f2fs_finish_read_bio(struct bio *bio, bool in_task)
> > > >   	bio_put(bio);
> > > >   }
> > > > +static void f2fs_finish_write_bio(struct f2fs_sb_info *sbi, struct bio *bio)
> > > > +{
> > > > +	struct folio_iter fi;
> > > > +	struct bio_post_write_ctx *write_ctx = (struct bio_post_write_ctx *)bio->bi_private;
> > > > +
> > > > +	bio_for_each_folio_all(fi, bio) {
> > > > +		struct folio *folio = fi.folio;
> > > > +		enum count_type type;
> > > > +
> > > > +		if (fscrypt_is_bounce_folio(folio)) {
> > > > +			struct folio *io_folio = folio;
> > > > +
> > > > +			folio = fscrypt_pagecache_folio(io_folio);
> > > > +			fscrypt_free_bounce_page(&io_folio->page);
> > > > +		}
> > > > +
> > > > +#ifdef CONFIG_F2FS_FS_COMPRESSION
> > > > +		if (f2fs_is_compressed_page(folio)) {
> > > > +			f2fs_compress_write_end_io(bio, folio);
> > > > +			continue;
> > > > +		}
> > > > +#endif
> > > > +
> > > > +		type = WB_DATA_TYPE(folio, false);
> > > > +
> > > > +		if (unlikely(bio->bi_status != BLK_STS_OK)) {
> > > > +			mapping_set_error(folio->mapping, -EIO);
> > > > +			if (type == F2FS_WB_CP_DATA)
> > > > +				f2fs_stop_checkpoint(sbi, true,
> > > > +						STOP_CP_REASON_WRITE_FAIL);
> > > > +		}
> > > > +
> > > > +		f2fs_bug_on(sbi, is_node_folio(folio) &&
> > > > +				folio->index != nid_of_node(folio));
> > > > +
> > > > +		dec_page_count(sbi, type);
> > > > +		if (f2fs_in_warm_node_list(sbi, folio))
> > > > +			f2fs_del_fsync_node_entry(sbi, folio);
> > > > +		folio_clear_f2fs_gcing(folio);
> > > > +		folio_end_writeback(folio);
> > > > +	}
> > > > +	if (!get_pages(sbi, F2FS_WB_CP_DATA) &&
> > > > +				wq_has_sleeper(&sbi->cp_wait))
> > > > +		wake_up(&sbi->cp_wait);
> > > > +
> > > > +	if (write_ctx)
> > > > +		mempool_free(write_ctx, bio_post_write_ctx_pool);
> > > > +	bio_put(bio);
> > > > +}
> > > > +
> > > >   static void f2fs_verify_bio(struct work_struct *work)
> > > >   {
> > > >   	struct bio_post_read_ctx *ctx =
> > > > @@ -314,58 +372,32 @@ static void f2fs_read_end_io(struct bio *bio)
> > > >   	f2fs_verify_and_finish_bio(bio, intask);
> > > >   }
> > > > +static void f2fs_finish_write_bio_async_work(struct work_struct *work)
> > > > +{
> > > > +	struct bio_post_write_ctx *ctx =
> > > > +		container_of(work, struct bio_post_write_ctx, work);
> > > > +
> > > > +	f2fs_finish_write_bio(ctx->sbi, ctx->bio);
> > > > +}
> > > > +
> > > >   static void f2fs_write_end_io(struct bio *bio)
> > > >   {
> > > > -	struct f2fs_sb_info *sbi;
> > > > -	struct folio_iter fi;
> > > > +	struct f2fs_sb_info *sbi = F2FS_F_SB(bio_first_folio_all(bio));
> > > > +	struct bio_post_write_ctx *write_ctx;
> > > >   	iostat_update_and_unbind_ctx(bio);
> > > > -	sbi = bio->bi_private;
> > > >   	if (time_to_inject(sbi, FAULT_WRITE_IO))
> > > >   		bio->bi_status = BLK_STS_IOERR;
> > > > -	bio_for_each_folio_all(fi, bio) {
> > > > -		struct folio *folio = fi.folio;
> > > > -		enum count_type type;
> > > > -
> > > > -		if (fscrypt_is_bounce_folio(folio)) {
> > > > -			struct folio *io_folio = folio;
> > > > -
> > > > -			folio = fscrypt_pagecache_folio(io_folio);
> > > > -			fscrypt_free_bounce_page(&io_folio->page);
> > > > -		}
> > > > -
> > > > -#ifdef CONFIG_F2FS_FS_COMPRESSION
> > > > -		if (f2fs_is_compressed_page(folio)) {
> > > > -			f2fs_compress_write_end_io(bio, folio);
> > > > -			continue;
> > > > -		}
> > > > -#endif
> > > > -
> > > > -		type = WB_DATA_TYPE(folio, false);
> > > > -
> > > > -		if (unlikely(bio->bi_status != BLK_STS_OK)) {
> > > > -			mapping_set_error(folio->mapping, -EIO);
> > > > -			if (type == F2FS_WB_CP_DATA)
> > > > -				f2fs_stop_checkpoint(sbi, true,
> > > > -						STOP_CP_REASON_WRITE_FAIL);
> > > > -		}
> > > > -
> > > > -		f2fs_bug_on(sbi, is_node_folio(folio) &&
> > > > -				folio->index != nid_of_node(folio));
> > > > -
> > > > -		dec_page_count(sbi, type);
> > > > -		if (f2fs_in_warm_node_list(sbi, folio))
> > > > -			f2fs_del_fsync_node_entry(sbi, folio);
> > > > -		folio_clear_f2fs_gcing(folio);
> > > > -		folio_end_writeback(folio);
> > > > +	write_ctx = (struct bio_post_write_ctx *)bio->bi_private;
> > > > +	if (write_ctx) {
> > > > +		INIT_WORK(&write_ctx->work, f2fs_finish_write_bio_async_work);
> > > > +		queue_work(write_ctx->sbi->post_write_wq, &write_ctx->work);
> > > > +		return;
> > > >   	}
> > > > -	if (!get_pages(sbi, F2FS_WB_CP_DATA) &&
> > > > -				wq_has_sleeper(&sbi->cp_wait))
> > > > -		wake_up(&sbi->cp_wait);
> > > > -	bio_put(bio);
> > > > +	f2fs_finish_write_bio(sbi, bio);
> > > >   }
> > > >   #ifdef CONFIG_BLK_DEV_ZONED
> > > > @@ -467,11 +499,10 @@ static struct bio *__bio_alloc(struct f2fs_io_info *fio, int npages)
> > > >   		bio->bi_private = NULL;
> > > >   	} else {
> > > >   		bio->bi_end_io = f2fs_write_end_io;
> > > > -		bio->bi_private = sbi;
> > > > +		bio->bi_private = NULL;
> > > >   		bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi,
> > > >   						fio->type, fio->temp);
> > > >   	}
> > > > -	iostat_alloc_and_bind_ctx(sbi, bio, NULL);
> > > >   	if (fio->io_wbc)
> > > >   		wbc_init_bio(fio->io_wbc, bio);
> > > > @@ -701,6 +732,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
> > > >   	/* Allocate a new bio */
> > > >   	bio = __bio_alloc(fio, 1);
> > > > +	iostat_alloc_and_bind_ctx(fio->sbi, bio, NULL, NULL);
> > > >   	f2fs_set_bio_crypt_ctx(bio, fio_folio->mapping->host,
> > > >   			fio_folio->index, fio, GFP_NOIO);
> > > > @@ -899,6 +931,8 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio)
> > > >   alloc_new:
> > > >   	if (!bio) {
> > > >   		bio = __bio_alloc(fio, BIO_MAX_VECS);
> > > > +		iostat_alloc_and_bind_ctx(fio->sbi, bio, NULL, NULL);
> > > > +
> > > >   		f2fs_set_bio_crypt_ctx(bio, folio->mapping->host,
> > > >   				folio->index, fio, GFP_NOIO);
> > > > @@ -948,6 +982,7 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
> > > >   	struct f2fs_bio_info *io = sbi->write_io[btype] + fio->temp;
> > > >   	struct folio *bio_folio;
> > > >   	enum count_type type;
> > > > +	struct bio_post_write_ctx *write_ctx = NULL;
> > > >   	f2fs_bug_on(sbi, is_read_io(fio->op));
> > > > @@ -1001,6 +1036,13 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
> > > >   		f2fs_set_bio_crypt_ctx(io->bio, fio_inode(fio),
> > > >   				bio_folio->index, fio, GFP_NOIO);
> > > >   		io->fio = *fio;
> > > > +
> > > > +		if (folio_test_dropbehind(bio_folio)) {
> > > > +			write_ctx = mempool_alloc(bio_post_write_ctx_pool, GFP_NOFS);
> > > > +			write_ctx->bio = io->bio;
> > > > +			write_ctx->sbi = sbi;
> > > > +		}
> > > > +		iostat_alloc_and_bind_ctx(fio->sbi, io->bio, NULL, write_ctx);
> > > >   	}
> > > >   	if (!bio_add_folio(io->bio, bio_folio, folio_size(bio_folio), 0)) {
> > > > @@ -1077,7 +1119,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
> > > >   		ctx->decompression_attempted = false;
> > > >   		bio->bi_private = ctx;
> > > >   	}
> > > > -	iostat_alloc_and_bind_ctx(sbi, bio, ctx);
> > > > +	iostat_alloc_and_bind_ctx(sbi, bio, ctx, NULL);
> > > >   	return bio;
> > > >   }
> > > > @@ -3540,6 +3582,7 @@ static int f2fs_write_begin(const struct kiocb *iocb,
> > > >   	bool use_cow = false;
> > > >   	block_t blkaddr = NULL_ADDR;
> > > >   	int err = 0;
> > > > +	fgf_t fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT;
> > > >   	trace_f2fs_write_begin(inode, pos, len);
> > > > @@ -3582,12 +3625,13 @@ static int f2fs_write_begin(const struct kiocb *iocb,
> > > >   #endif
> > > >   repeat:
> > > > +	if (iocb && iocb->ki_flags & IOCB_DONTCACHE)
> > > > +		fgp |= FGP_DONTCACHE;
> > > >   	/*
> > > >   	 * Do not use FGP_STABLE to avoid deadlock.
> > > >   	 * Will wait that below with our IO control.
> > > >   	 */
> > > > -	folio = __filemap_get_folio(mapping, index,
> > > > -				FGP_LOCK | FGP_WRITE | FGP_CREAT, GFP_NOFS);
> > > > +	folio = __filemap_get_folio(mapping, index, fgp, GFP_NOFS);
> > > >   	if (IS_ERR(folio)) {
> > > >   		err = PTR_ERR(folio);
> > > >   		goto fail;
> > > > @@ -4127,12 +4171,38 @@ int __init f2fs_init_post_read_processing(void)
> > > >   	return -ENOMEM;
> > > >   }
> > > > +int __init f2fs_init_post_write_processing(void)
> > > > +{
> > > > +	bio_post_write_ctx_cache =
> > > > +		kmem_cache_create("f2fs_bio_post_write_ctx",
> > > > +				sizeof(struct bio_post_write_ctx), 0, 0, NULL);
> > > > +	if (!bio_post_write_ctx_cache)
> > > > +		goto fail;
> > > > +	bio_post_write_ctx_pool =
> > > > +		mempool_create_slab_pool(NUM_PREALLOC_POST_READ_CTXS,
> > > > +				bio_post_write_ctx_cache);
> > > > +	if (!bio_post_write_ctx_pool)
> > > > +		goto fail_free_cache;
> > > > +	return 0;
> > > > +
> > > > +fail_free_cache:
> > > > +	kmem_cache_destroy(bio_post_write_ctx_cache);
> > > > +fail:
> > > > +	return -ENOMEM;
> > > > +}
> > > > +
> > > >   void f2fs_destroy_post_read_processing(void)
> > > >   {
> > > >   	mempool_destroy(bio_post_read_ctx_pool);
> > > >   	kmem_cache_destroy(bio_post_read_ctx_cache);
> > > >   }
> > > > +void f2fs_destroy_post_write_processing(void)
> > > > +{
> > > > +	mempool_destroy(bio_post_write_ctx_pool);
> > > > +	kmem_cache_destroy(bio_post_write_ctx_cache);
> > > > +}
> > > > +
> > > >   int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi)
> > > >   {
> > > >   	if (!f2fs_sb_has_encrypt(sbi) &&
> > > > @@ -4146,12 +4216,26 @@ int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi)
> > > >   	return sbi->post_read_wq ? 0 : -ENOMEM;
> > > >   }
> > > > +int f2fs_init_post_write_wq(struct f2fs_sb_info *sbi)
> > > > +{
> > > > +	sbi->post_write_wq = alloc_workqueue("f2fs_post_write_wq",
> > > > +						 WQ_UNBOUND | WQ_HIGHPRI,
> > > > +						 num_online_cpus());
> > > > +	return sbi->post_write_wq ? 0 : -ENOMEM;
> > > > +}
> > > > +
> > > >   void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi)
> > > >   {
> > > >   	if (sbi->post_read_wq)
> > > >   		destroy_workqueue(sbi->post_read_wq);
> > > >   }
> > > > +void f2fs_destroy_post_write_wq(struct f2fs_sb_info *sbi)
> > > > +{
> > > > +	if (sbi->post_write_wq)
> > > > +		destroy_workqueue(sbi->post_write_wq);
> > > > +}
> > > > +
> > > >   int __init f2fs_init_bio_entry_cache(void)
> > > >   {
> > > >   	bio_entry_slab = f2fs_kmem_cache_create("f2fs_bio_entry_slab",
> > > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > > > index 46be7560548c..fe3f81876b23 100644
> > > > --- a/fs/f2fs/f2fs.h
> > > > +++ b/fs/f2fs/f2fs.h
> > > > @@ -1812,6 +1812,7 @@ struct f2fs_sb_info {
> > > >   	/* Precomputed FS UUID checksum for seeding other checksums */
> > > >   	__u32 s_chksum_seed;
> > > > +	struct workqueue_struct *post_write_wq;
> > > >   	struct workqueue_struct *post_read_wq;	/* post read workqueue */
> > > >   	/*
> > > > @@ -4023,9 +4024,13 @@ bool f2fs_release_folio(struct folio *folio, gfp_t wait);
> > > >   bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
> > > >   void f2fs_clear_page_cache_dirty_tag(struct folio *folio);
> > > >   int f2fs_init_post_read_processing(void);
> > > > +int f2fs_init_post_write_processing(void);
> > > >   void f2fs_destroy_post_read_processing(void);
> > > > +void f2fs_destroy_post_write_processing(void);
> > > >   int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi);
> > > > +int f2fs_init_post_write_wq(struct f2fs_sb_info *sbi);
> > > >   void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi);
> > > > +void f2fs_destroy_post_write_wq(struct f2fs_sb_info *sbi);
> > > >   extern const struct iomap_ops f2fs_iomap_ops;
> > > >   /*
> > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > index 42faaed6a02d..8aa6a4fd52e8 100644
> > > > --- a/fs/f2fs/file.c
> > > > +++ b/fs/f2fs/file.c
> > > > @@ -5443,5 +5443,5 @@ const struct file_operations f2fs_file_operations = {
> > > >   	.splice_read	= f2fs_file_splice_read,
> > > >   	.splice_write	= iter_file_splice_write,
> > > >   	.fadvise	= f2fs_file_fadvise,
> > > > -	.fop_flags	= FOP_BUFFER_RASYNC,
> > > > +	.fop_flags	= FOP_BUFFER_RASYNC | FOP_DONTCACHE,
> > > >   };
> > > > diff --git a/fs/f2fs/iostat.c b/fs/f2fs/iostat.c
> > > > index f8703038e1d8..b2e6ce80c68d 100644
> > > > --- a/fs/f2fs/iostat.c
> > > > +++ b/fs/f2fs/iostat.c
> > > > @@ -245,7 +245,7 @@ void iostat_update_and_unbind_ctx(struct bio *bio)
> > > >   	if (op_is_write(bio_op(bio))) {
> > > >   		lat_type = bio->bi_opf & REQ_SYNC ?
> > > >   				WRITE_SYNC_IO : WRITE_ASYNC_IO;
> > > > -		bio->bi_private = iostat_ctx->sbi;
> > > > +		bio->bi_private = iostat_ctx->post_write_ctx;
> > > >   	} else {
> > > >   		lat_type = READ_IO;
> > > >   		bio->bi_private = iostat_ctx->post_read_ctx;
> > > > @@ -256,7 +256,8 @@ void iostat_update_and_unbind_ctx(struct bio *bio)
> > > >   }
> > > >   void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
> > > > -		struct bio *bio, struct bio_post_read_ctx *ctx)
> > > > +		struct bio *bio, struct bio_post_read_ctx *read_ctx,
> > > > +		struct bio_post_write_ctx *write_ctx)
> > > >   {
> > > >   	struct bio_iostat_ctx *iostat_ctx;
> > > >   	/* Due to the mempool, this never fails. */
> > > > @@ -264,7 +265,8 @@ void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
> > > >   	iostat_ctx->sbi = sbi;
> > > >   	iostat_ctx->submit_ts = 0;
> > > >   	iostat_ctx->type = 0;
> > > > -	iostat_ctx->post_read_ctx = ctx;
> > > > +	iostat_ctx->post_read_ctx = read_ctx;
> > > > +	iostat_ctx->post_write_ctx = write_ctx;
> > > >   	bio->bi_private = iostat_ctx;
> > > >   }
> > > > diff --git a/fs/f2fs/iostat.h b/fs/f2fs/iostat.h
> > > > index eb99d05cf272..a358909bb5e8 100644
> > > > --- a/fs/f2fs/iostat.h
> > > > +++ b/fs/f2fs/iostat.h
> > > > @@ -40,6 +40,7 @@ struct bio_iostat_ctx {
> > > >   	unsigned long submit_ts;
> > > >   	enum page_type type;
> > > >   	struct bio_post_read_ctx *post_read_ctx;
> > > > +	struct bio_post_write_ctx *post_write_ctx;
> > > >   };
> > > >   static inline void iostat_update_submit_ctx(struct bio *bio,
> > > > @@ -60,7 +61,8 @@ static inline struct bio_post_read_ctx *get_post_read_ctx(struct bio *bio)
> > > >   extern void iostat_update_and_unbind_ctx(struct bio *bio);
> > > >   extern void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
> > > > -		struct bio *bio, struct bio_post_read_ctx *ctx);
> > > > +		struct bio *bio, struct bio_post_read_ctx *read_ctx,
> > > > +		struct bio_post_write_ctx *write_ctx);
> > > >   extern int f2fs_init_iostat_processing(void);
> > > >   extern void f2fs_destroy_iostat_processing(void);
> > > >   extern int f2fs_init_iostat(struct f2fs_sb_info *sbi);
> > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > > index cc82d42ef14c..8501008e42b2 100644
> > > > --- a/fs/f2fs/segment.c
> > > > +++ b/fs/f2fs/segment.c
> > > > @@ -3856,7 +3856,7 @@ int f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct folio *folio,
> > > >   		f2fs_inode_chksum_set(sbi, folio);
> > > >   	}
> > > > -	if (fio) {
> > > > +	if (fio && !folio_test_dropbehind(folio)) {
> > > >   		struct f2fs_bio_info *io;
> > > >   		INIT_LIST_HEAD(&fio->list);
> > > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > > > index e16c4e2830c2..110dfe073aee 100644
> > > > --- a/fs/f2fs/super.c
> > > > +++ b/fs/f2fs/super.c
> > > > @@ -1963,6 +1963,7 @@ static void f2fs_put_super(struct super_block *sb)
> > > >   	flush_work(&sbi->s_error_work);
> > > >   	f2fs_destroy_post_read_wq(sbi);
> > > > +	f2fs_destroy_post_write_wq(sbi);
> > > >   	kvfree(sbi->ckpt);
> > > > @@ -4959,6 +4960,12 @@ static int f2fs_fill_super(struct super_block *sb, struct fs_context *fc)
> > > >   		goto free_devices;
> > > >   	}
> > > > +	err = f2fs_init_post_write_wq(sbi);
> > > > +	if (err) {
> > > > +		f2fs_err(sbi, "Failed to initialize post write workqueue");
> > > > +		goto free_devices;
> > > > +	}
> > > > +
> > > >   	sbi->total_valid_node_count =
> > > >   				le32_to_cpu(sbi->ckpt->valid_node_count);
> > > >   	percpu_counter_set(&sbi->total_valid_inode_count,
> > > > @@ -5240,6 +5247,7 @@ static int f2fs_fill_super(struct super_block *sb, struct fs_context *fc)
> > > >   	/* flush s_error_work before sbi destroy */
> > > >   	flush_work(&sbi->s_error_work);
> > > >   	f2fs_destroy_post_read_wq(sbi);
> > > > +	f2fs_destroy_post_write_wq(sbi);
> > > >   free_devices:
> > > >   	destroy_device_list(sbi);
> > > >   	kvfree(sbi->ckpt);
> > > > @@ -5435,9 +5443,12 @@ static int __init init_f2fs_fs(void)
> > > >   	err = f2fs_init_post_read_processing();
> > > >   	if (err)
> > > >   		goto free_root_stats;
> > > > -	err = f2fs_init_iostat_processing();
> > > > +	err = f2fs_init_post_write_processing();
> > > >   	if (err)
> > > >   		goto free_post_read;
> > > > +	err = f2fs_init_iostat_processing();
> > > > +	if (err)
> > > > +		goto free_post_write;
> > > >   	err = f2fs_init_bio_entry_cache();
> > > >   	if (err)
> > > >   		goto free_iostat;
> > > > @@ -5469,6 +5480,8 @@ static int __init init_f2fs_fs(void)
> > > >   	f2fs_destroy_bio_entry_cache();
> > > >   free_iostat:
> > > >   	f2fs_destroy_iostat_processing();
> > > > +free_post_write:
> > > > +	f2fs_destroy_post_write_processing();
> > > >   free_post_read:
> > > >   	f2fs_destroy_post_read_processing();
> > > >   free_root_stats:
> > > > @@ -5504,6 +5517,7 @@ static void __exit exit_f2fs_fs(void)
> > > >   	f2fs_destroy_bio_entry_cache();
> > > >   	f2fs_destroy_iostat_processing();
> > > >   	f2fs_destroy_post_read_processing();
> > > > +	f2fs_destroy_post_write_processing();
> > > >   	f2fs_destroy_root_stats();
> > > >   	f2fs_exit_shrinker();
> > > >   	f2fs_exit_sysfs();
> > > > -- 
> > > > 2.50.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ