[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3210864.1739229537@warthog.procyon.org.uk>
Date: Mon, 10 Feb 2025 23:18:57 +0000
From: David Howells <dhowells@...hat.com>
To: "Ihor Solodrai" <ihor.solodrai@...ux.dev>
Cc: dhowells@...hat.com, "Marc Dionne" <marc.dionne@...istor.com>,
"Steve
French" <stfrench@...rosoft.com>,
"Eric Van Hensbergen" <ericvh@...nel.org>,
"Latchesar Ionkov" <lucho@...kov.net>,
"Dominique Martinet" <asmadeus@...ewreck.org>,
"Christian Schoenebeck" <linux_oss@...debyte.com>,
"Paulo Alcantara" <pc@...guebit.com>,
"Jeff Layton" <jlayton@...nel.org>,
"Christian Brauner" <brauner@...nel.org>, v9fs@...ts.linux.dev,
linux-cifs@...r.kernel.org, netfs@...ts.linux.dev,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
ast@...nel.org, bpf@...r.kernel.org
Subject: Re: [PATCH] netfs: Add retry stat counters
Ihor Solodrai <ihor.solodrai@...ux.dev> wrote:
> Done. I pushed the logs to the previously mentioned github branch:
> https://github.com/kernel-patches/bpf/commit/699a3bb95e2291d877737438fb641628702fd18f
Perfect, thanks.
Looking at the last record of /proc/fs/netfs/requests, I see:
REQUEST OR REF FL ERR OPS COVERAGE
======== == === == ==== === =========
00000a98 RA 1 2001 0 0 @0000 2000/2000
So the request of interest is R=00000a98 in the trace. Grepping for that, I
see (with a few columns cut out):
test_progs-no_a-97: netfs_rreq_ref: R=00000a98 NEW r=1
test_progs-no_a-97: netfs_read: R=00000a98 READAHEAD c=00000000 ni=2ec02f16 s=0 l=2000 sz=17a8
test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET SUBREQ r=2
test_progs-no_a-97: netfs_sreq: R=00000a98[1] DOWN TERM f=192 s=0 17a8/17a8 s=1 e=0
test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET WORK r=3
test_progs-no_a-97: netfs_sreq_ref: R=00000a98[1] PUT TERM r=1
test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET SUBREQ r=4
test_progs-no_a-97: netfs_sreq: R=00000a98[2] ZERO SUBMT f=00 s=17a8 0/858 s=0 e=0
kworker/u8:2-36: netfs_rreq_ref: R=00000a98 SEE WORK r=4
kworker/u8:2-36: netfs_rreq: R=00000a98 RA COLLECT f=2021
kworker/u8:2-36: netfs_sreq: R=00000a98[1] DOWN DSCRD f=92 s=0 17a8/17a8 s=1 e=0
kworker/u8:2-36: netfs_sreq_ref: R=00000a98[1] PUT DONE r=0
kworker/u8:2-36: netfs_sreq: R=00000a98[1] DOWN FREE f=92 s=0 17a8/17a8 s=1 e=0
kworker/u8:2-36: netfs_rreq_ref: R=00000a98 PUT SUBREQ r=3
kworker/u8:2-36: netfs_rreq: R=00000a98 RA COMPLET f=2021
kworker/u8:2-36: netfs_rreq: R=00000a98 RA WAKE-IP f=2021
kworker/u8:2-36: netfs_rreq: R=00000a98 RA DONE f=2001
kworker/u8:2-36: netfs_rreq_ref: R=00000a98 PUT WORK r=2
test_progs-no_a-97: netfs_sreq: R=00000a98[2] ZERO TERM f=102 s=17a8 858/858 s=1 e=0
test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET WORK r=3
test_progs-no_a-97: netfs_sreq_ref: R=00000a98[2] PUT TERM r=1
test_progs-no_a-97: netfs_rreq_ref: R=00000a98 PUT RETURN r=2
kworker/u8:2-36: netfs_rreq_ref: R=00000a98 SEE WORK r=2
kworker/u8:2-36: netfs_rreq_ref: R=00000a98 PUT WORK r=1
You can see subrequest 1 completes fine, the subrequest is freed and the ref
it had on the request is put:
netfs_sreq: R=00000a98[1] DOWN FREE f=92 s=0 17a8/17a8 s=1 e=0
netfs_rreq_ref: R=00000a98 PUT SUBREQ r=3
Subrequest 2, however isn't collected:
netfs_sreq: R=00000a98[2] ZERO SUBMT f=00 s=17a8 0/858 s=0 e=0
netfs_sreq: R=00000a98[2] ZERO TERM f=102 s=17a8 858/858 s=1 e=0
netfs_sreq_ref: R=00000a98[2] PUT TERM r=1
and the work happens again in kworker/u8:2-36 right at the end:
netfs_rreq_ref: R=00000a98 SEE WORK r=2
netfs_rreq_ref: R=00000a98 PUT WORK r=1
but this doesn't do anything.
The excess buffer clearance happened in the app thread (test_progs-no_a-97):
netfs_sreq: R=00000a98[2] ZERO TERM f=102 s=17a8 858/858 s=1 e=0
> Let me know if I can help with anything else.
Can you add some more tracepoints?
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_collect/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_collect_sreq/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_collect_state/enable
However, I think I may have spotted the issue: I'm mixing
clear_and_wake_up_bit() for NETFS_RREQ_IN_PROGRESS (which will use a common
system waitqueue) with waiting on an rreq-specific waitqueue in such as
netfs_wait_for_read().
I'll work up a fix patch for that tomorrow.
Thanks,
David
Powered by blists - more mailing lists