lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <84a8e6737fca05dd3ec234760f1c77901d915ef9@linux.dev>
Date: Tue, 11 Feb 2025 00:54:33 +0000
From: "Ihor Solodrai" <ihor.solodrai@...ux.dev>
To: "David Howells" <dhowells@...hat.com>
Cc: dhowells@...hat.com, "Marc Dionne" <marc.dionne@...istor.com>, "Steve 
 French" <stfrench@...rosoft.com>, "Eric Van Hensbergen"
 <ericvh@...nel.org>, "Latchesar  Ionkov" <lucho@...kov.net>, "Dominique
 Martinet" <asmadeus@...ewreck.org>, "Christian Schoenebeck"
 <linux_oss@...debyte.com>, "Paulo Alcantara" <pc@...guebit.com>, "Jeff
 Layton" <jlayton@...nel.org>, "Christian Brauner" <brauner@...nel.org>,
 v9fs@...ts.linux.dev, linux-cifs@...r.kernel.org, netfs@...ts.linux.dev,
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
 ast@...nel.org, bpf@...r.kernel.org
Subject: Re: [PATCH] netfs: Add retry stat counters

On 2/10/25 3:18 PM, David Howells wrote:
> Ihor Solodrai <ihor.solodrai@...ux.dev> wrote:
>
>> Done. I pushed the logs to the previously mentioned github branch:
>> https://github.com/kernel-patches/bpf/commit/699a3bb95e2291d877737438fb641628702fd18f
>
> Perfect, thanks.
>
> Looking at the last record of /proc/fs/netfs/requests, I see:
>
> 	REQUEST  OR REF FL ERR  OPS COVERAGE
> 	======== == === == ==== === =========
> 	00000a98 RA   1 2001    0   0 @0000 2000/2000
>
> So the request of interest is R=00000a98 in the trace.  Grepping for that, I
> see (with a few columns cut out):
>
>  test_progs-no_a-97: netfs_rreq_ref: R=00000a98 NEW         r=1
>  test_progs-no_a-97: netfs_read: R=00000a98 READAHEAD c=00000000 ni=2ec02f16 s=0 l=2000 sz=17a8
>  test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET SUBREQ  r=2
>  test_progs-no_a-97: netfs_sreq: R=00000a98[1] DOWN TERM  f=192 s=0 17a8/17a8 s=1 e=0
>  test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET WORK    r=3
>  test_progs-no_a-97: netfs_sreq_ref: R=00000a98[1] PUT TERM    r=1
>  test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET SUBREQ  r=4
>  test_progs-no_a-97: netfs_sreq: R=00000a98[2] ZERO SUBMT f=00 s=17a8 0/858 s=0 e=0
>     kworker/u8:2-36: netfs_rreq_ref: R=00000a98 SEE WORK    r=4
>     kworker/u8:2-36: netfs_rreq: R=00000a98 RA COLLECT f=2021
>     kworker/u8:2-36: netfs_sreq: R=00000a98[1] DOWN DSCRD f=92 s=0 17a8/17a8 s=1 e=0
>     kworker/u8:2-36: netfs_sreq_ref: R=00000a98[1] PUT DONE    r=0
>     kworker/u8:2-36: netfs_sreq: R=00000a98[1] DOWN FREE  f=92 s=0 17a8/17a8 s=1 e=0
>     kworker/u8:2-36: netfs_rreq_ref: R=00000a98 PUT SUBREQ  r=3
>     kworker/u8:2-36: netfs_rreq: R=00000a98 RA COMPLET f=2021
>     kworker/u8:2-36: netfs_rreq: R=00000a98 RA WAKE-IP f=2021
>     kworker/u8:2-36: netfs_rreq: R=00000a98 RA DONE    f=2001
>     kworker/u8:2-36: netfs_rreq_ref: R=00000a98 PUT WORK    r=2
>  test_progs-no_a-97: netfs_sreq: R=00000a98[2] ZERO TERM  f=102 s=17a8 858/858 s=1 e=0
>  test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET WORK    r=3
>  test_progs-no_a-97: netfs_sreq_ref: R=00000a98[2] PUT TERM    r=1
>  test_progs-no_a-97: netfs_rreq_ref: R=00000a98 PUT RETURN  r=2
>     kworker/u8:2-36: netfs_rreq_ref: R=00000a98 SEE WORK    r=2
>     kworker/u8:2-36: netfs_rreq_ref: R=00000a98 PUT WORK    r=1
>
> You can see subrequest 1 completes fine, the subrequest is freed and the ref
> it had on the request is put:
>
> 	netfs_sreq: R=00000a98[1] DOWN FREE  f=92 s=0 17a8/17a8 s=1 e=0
> 	netfs_rreq_ref: R=00000a98 PUT SUBREQ  r=3
>
> Subrequest 2, however isn't collected:
>
> 	netfs_sreq: R=00000a98[2] ZERO SUBMT f=00 s=17a8 0/858 s=0 e=0
> 	netfs_sreq: R=00000a98[2] ZERO TERM  f=102 s=17a8 858/858 s=1 e=0
> 	netfs_sreq_ref: R=00000a98[2] PUT TERM    r=1
>
> and the work happens again in kworker/u8:2-36 right at the end:
>
> 	netfs_rreq_ref: R=00000a98 SEE WORK    r=2
> 	netfs_rreq_ref: R=00000a98 PUT WORK    r=1
>
> but this doesn't do anything.
>
> The excess buffer clearance happened in the app thread (test_progs-no_a-97):
>
> 	netfs_sreq: R=00000a98[2] ZERO TERM  f=102 s=17a8 858/858 s=1 e=0
>
>> Let me know if I can help with anything else.
>
> Can you add some more tracepoints?
>
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_collect/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_collect_sreq/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_collect_state/enable

See here: https://github.com/kernel-patches/bpf/commit/517f51d1f6c09ebab9df3e3d17bb669601ab14ef

Beware, uncompressed trace-cmd.log is 37Mb

>
> However, I think I may have spotted the issue: I'm mixing
> clear_and_wake_up_bit() for NETFS_RREQ_IN_PROGRESS (which will use a common
> system waitqueue) with waiting on an rreq-specific waitqueue in such as
> netfs_wait_for_read().
>
> I'll work up a fix patch for that tomorrow.

Great! Thank you.

>
> Thanks,
> David
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ