[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aG2JwM28-IOge4zF@kernel.org>
Date: Tue, 8 Jul 2025 17:12:32 -0400
From: Mike Snitzer <snitzer@...nel.org>
To: Jeff Layton <jlayton@...nel.org>
Cc: Chuck Lever <chuck.lever@...cle.com>,
Trond Myklebust <trondmy@...nel.org>,
Anna Schumaker <anna@...nel.org>, NeilBrown <neil@...wn.name>,
Olga Kornievskaia <okorniev@...hat.com>,
Dai Ngo <Dai.Ngo@...cle.com>, Tom Talpey <tom@...pey.com>,
linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 2/2] nfsd: call generic_fadvise after v3 READ, stable
WRITE or COMMIT
On Tue, Jul 08, 2025 at 10:34:15AM -0400, Jeff Layton wrote:
> On Thu, 2025-07-03 at 16:07 -0400, Chuck Lever wrote:
> > On 7/3/25 3:53 PM, Jeff Layton wrote:
> > > Recent testing has shown that keeping pagecache pages around for too
> > > long can be detrimental to performance with nfsd. Clients only rarely
> > > revisit the same data, so the pages tend to just hang around.
> > >
> > > This patch changes the pc_release callbacks for NFSv3 READ, WRITE and
> > > COMMIT to call generic_fadvise(..., POSIX_FADV_DONTNEED) on the accessed
> > > range.
> > >
> > > Signed-off-by: Jeff Layton <jlayton@...nel.org>
> > > ---
> > > fs/nfsd/debugfs.c | 2 ++
> > > fs/nfsd/nfs3proc.c | 59 +++++++++++++++++++++++++++++++++++++++++++++---------
> > > fs/nfsd/nfsd.h | 1 +
> > > fs/nfsd/nfsproc.c | 4 ++--
> > > fs/nfsd/vfs.c | 21 ++++++++++++++-----
> > > fs/nfsd/vfs.h | 5 +++--
> > > fs/nfsd/xdr3.h | 3 +++
> > > 7 files changed, 77 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
> > > index 84b0c8b559dc90bd5c2d9d5e15c8e0682c0d610c..b007718dd959bc081166ec84e06f577a8fc2b46b 100644
> > > --- a/fs/nfsd/debugfs.c
> > > +++ b/fs/nfsd/debugfs.c
> > > @@ -44,4 +44,6 @@ void nfsd_debugfs_init(void)
> > >
> > > debugfs_create_file("disable-splice-read", S_IWUSR | S_IRUGO,
> > > nfsd_top_dir, NULL, &nfsd_dsr_fops);
> > > + debugfs_create_bool("enable-fadvise-dontneed", 0644,
> > > + nfsd_top_dir, &nfsd_enable_fadvise_dontneed);
> >
> > I prefer that this setting is folded into the new io_cache_read /
> > io_cache_write tune-ables that Mike's patch adds, rather than adding
> > a new boolean.
> >
> > That might make a hybrid "DONTCACHE for READ and fadvise for WRITE"
> > pretty easy.
> >
>
> I ended up rebasing Mike's dontcache branch on top of v6.16-rc5 with
> all of Chuck's trees in. I then added the attached patch and did some
> testing with a couple of machines I checked out internally at Meta.
> This is the throughput results with the fio-seq-RW test with the file
> size set to 100G and the duration at 5 mins.
>
> Note that:
>
> read and writes buffered:
> READ: bw=3024MiB/s (3171MB/s), 186MiB/s-191MiB/s (195MB/s-201MB/s), io=889GiB (954GB), run=300012-300966msec
> WRITE: bw=2015MiB/s (2113MB/s), 124MiB/s-128MiB/s (131MB/s-134MB/s), io=592GiB (636GB), run=300012-300966msec
>
> READ: bw=2902MiB/s (3043MB/s), 177MiB/s-183MiB/s (186MB/s-192MB/s), io=851GiB (913GB), run=300027-300118msec
> WRITE: bw=1934MiB/s (2027MB/s), 119MiB/s-122MiB/s (124MB/s-128MB/s), io=567GiB (608GB), run=300027-300118msec
>
> READ: bw=2897MiB/s (3037MB/s), 178MiB/s-183MiB/s (186MB/s-192MB/s), io=849GiB (911GB), run=300006-300078msec
> WRITE: bw=1930MiB/s (2023MB/s), 119MiB/s-122MiB/s (125MB/s-128MB/s), io=565GiB (607GB), run=300006-300078msec
>
> reads and writes RWF_DONTCACHE:
> READ: bw=3090MiB/s (3240MB/s), 190MiB/s-195MiB/s (199MB/s-205MB/s), io=906GiB (972GB), run=300015-300113msec
> WRITE: bw=2060MiB/s (2160MB/s), 126MiB/s-130MiB/s (132MB/s-137MB/s), io=604GiB (648GB), run=300015-300113msec
>
> READ: bw=3057MiB/s (3205MB/s), 188MiB/s-193MiB/s (198MB/s-203MB/s), io=897GiB (963GB), run=300329-300450msec
> WRITE: bw=2037MiB/s (2136MB/s), 126MiB/s-129MiB/s (132MB/s-135MB/s), io=598GiB (642GB), run=300329-300450msec
>
> READ: bw=3166MiB/s (3320MB/s), 196MiB/s-200MiB/s (205MB/s-210MB/s), io=928GiB (996GB), run=300021-300090msec
> WRITE: bw=2111MiB/s (2213MB/s), 131MiB/s-133MiB/s (137MB/s-140MB/s), io=619GiB (664GB), run=300021-300090msec
>
> reads and writes witg O_DIRECT:
> READ: bw=3115MiB/s (3267MB/s), 192MiB/s-198MiB/s (201MB/s-208MB/s), io=913GiB (980GB), run=300025-300078msec
> WRITE: bw=2077MiB/s (2178MB/s), 128MiB/s-131MiB/s (134MB/s-138MB/s), io=609GiB (653GB), run=300025-300078msec
>
> READ: bw=3189MiB/s (3343MB/s), 197MiB/s-202MiB/s (207MB/s-211MB/s), io=934GiB (1003GB), run=300023-300096msec
> WRITE: bw=2125MiB/s (2228MB/s), 132MiB/s-134MiB/s (138MB/s-140MB/s), io=623GiB (669GB), run=300023-300096msec
>
> READ: bw=3113MiB/s (3264MB/s), 191MiB/s-197MiB/s (200MB/s-207MB/s), io=912GiB (979GB), run=300020-300098msec
> WRITE: bw=2075MiB/s (2175MB/s), 127MiB/s-131MiB/s (134MB/s-138MB/s), io=608GiB (653GB), run=300020-300098msec
>
> RWF_DONTCACHE on reads and stable writes + fadvise DONTNEED after COMMIT:
> READ: bw=2888MiB/s (3029MB/s), 178MiB/s-182MiB/s (187MB/s-191MB/s), io=846GiB (909GB), run=300012-300109msec
> WRITE: bw=1924MiB/s (2017MB/s), 118MiB/s-121MiB/s (124MB/s-127MB/s), io=564GiB (605GB), run=300012-300109msec
>
> READ: bw=2899MiB/s (3040MB/s), 180MiB/s-183MiB/s (188MB/s-192MB/s), io=852GiB (915GB), run=300022-300940msec
> WRITE: bw=1931MiB/s (2025MB/s), 119MiB/s-122MiB/s (125MB/s-128MB/s), io=567GiB (609GB), run=300022-300940msec
>
> READ: bw=2902MiB/s (3043MB/s), 179MiB/s-184MiB/s (188MB/s-193MB/s), io=853GiB (916GB), run=300913-301146msec
> WRITE: bw=1933MiB/s (2027MB/s), 119MiB/s-122MiB/s (125MB/s-128MB/s), io=568GiB (610GB), run=300913-301146msec
>
>
> The fadvise case is clearly slower than the others. Interestingly it
> also slowed down read performance, which leads me to believe that maybe
> the fadvise calls were interfering with concurrent reads. Given the
> disappointing numbers, I'll probably drop the last patch.
>
> There is probably a case to be made for patch #1, on the general
> principle of expediting sending the reply as much as possible. Chuck,
> let me know if you want me to submit that individually.
> --
> Jeff Layton <jlayton@...nel.org>
> From 14958516bf45f92a8609cb6ad504e92550b416d7 Mon Sep 17 00:00:00 2001
> From: Jeff Layton <jlayton@...nel.org>
> Date: Mon, 7 Jul 2025 11:00:34 -0400
> Subject: [PATCH] nfsd: add a NFSD_IO_FADVISE setting to io_cache_write
>
> Signed-off-by: Jeff Layton <jlayton@...nel.org>
Ah, you raced ahead and did what I suggested in my previous reply just
now. Too bad the performance is lacking... but I applaud you're
trying out a new/interesting idea!
BTW, measuring CPU and memory use is also important to capture to get
full picture.
Mike
Powered by blists - more mailing lists