[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8d8a5d5b00688ea553b106db690e8a01f15b1410@linux.dev>
Date: Mon, 10 Feb 2025 21:54:19 +0000
From: "Ihor Solodrai" <ihor.solodrai@...ux.dev>
To: "David Howells" <dhowells@...hat.com>
Cc: dhowells@...hat.com, "Marc Dionne" <marc.dionne@...istor.com>, "Steve
French" <stfrench@...rosoft.com>, "Eric Van Hensbergen"
<ericvh@...nel.org>, "Latchesar Ionkov" <lucho@...kov.net>, "Dominique
Martinet" <asmadeus@...ewreck.org>, "Christian Schoenebeck"
<linux_oss@...debyte.com>, "Paulo Alcantara" <pc@...guebit.com>, "Jeff
Layton" <jlayton@...nel.org>, "Christian Brauner" <brauner@...nel.org>,
v9fs@...ts.linux.dev, linux-cifs@...r.kernel.org, netfs@...ts.linux.dev,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
ast@...nel.org, bpf@...r.kernel.org
Subject: Re: [PATCH] netfs: Add retry stat counters
On 2/10/25 2:57 AM, David Howells wrote:
> Ihor Solodrai <ihor.solodrai@...ux.dev> wrote:
>
>> I recommend trying to reproduce with steps I shared in my initial report:
>> https://lore.kernel.org/bpf/a7x33d4dnMdGTtRivptq6S1i8btK70SNBP2XyX_xwDAhLvgQoPox6FVBOkifq4eBinfFfbZlIkMZBe3QarlWTxoEtHZwJCZbNKtaqrR7PvI=@pm.me/
>>
>> I know it may not be very convenient due to all the CI stuff,
>
> That's an understatement. :-)
>
>> but you should be able to use it to iterate on the kernel source locally and
>> narrow down the problem.
>
> Can you share just the reproducer without all the docker stuff?
I wrote a couple of shell scripts with a gist of what's happening on
CI: build kernel, build selftests and run. You may try them.
Pull this branch from my github:
https://github.com/theihor/bpf/tree/netfs-debug
It's the kernel source in a broken state with the scripts.
Inlining the scripts here:
## ./reproducer.sh
#!/bin/bash
set -euo pipefail
export KBUILD_OUTPUT=$(realpath kbuild-output)
mkdir -p $KBUILD_OUTPUT
cp -f repro.config $KBUILD_OUTPUT/.config
make olddefconfig
make -j$(nproc) all
make -j$(nproc) headers
# apt install lsb-release wget software-properties-common gnupg
# bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
export LLVM_VERSION=18
make -C tools/testing/selftests/bpf \
CLANG=clang-${LLVM_VERSION} \
LLC=llc-${LLVM_VERSION} \
LLVM_STRIP=llvm-strip-${LLVM_VERSION} \
-j$(nproc) test_progs-no_alu32
# wget https://github.com/danobi/vmtest/releases/download/v0.15.0/vmtest-x86_64
# chmod +x vmtest-x86_64
./vmtest-x86_64 -k $KBUILD_OUTPUT/$(make -s image_name) ./run-bpf-selftests.sh | tee test.log
## end of ./reproducer.sh
## ./run-bpf-selftests.sh
#!/bin/bash
/bin/mount bpffs /sys/fs/bpf -t bpf
ip link set lo up
echo 10 > /proc/sys/kernel/hung_task_timeout_secs
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_read/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_write/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_write_iter/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_rreq/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_rreq_ref/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_sreq/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_sreq_ref/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_failure/enable
function tail_proc {
src=$1
dst=$2
echo -n > $dst
while true; do
echo >> $dst
cat $src >> $dst
sleep 1
done
}
export -f tail_proc
nohup bash -c 'tail_proc /proc/fs/netfs/stats netfs-stats.log' & disown
nohup bash -c 'tail_proc /proc/fs/netfs/requests netfs-requests.log' & disown
nohup bash -c 'trace-cmd show -p > trace-cmd.log' & disown
cd tools/testing/selftests/bpf
./test_progs-no_alu32
## end of ./run-bpf-selftests.sh
One of the reasons for suggesting docker is that all the dependencies
are pre-packaged in the image, and so the environment is pretty close
to the actual CI environment. With only shell scripts you will have to
detect and install missing dependencies on your system and hope
package versions are more or less the same and don't affect the issue.
Notable things: LLVM 18, pahole, qemu, qemu-guest-agent, vmtest tool.
> Is this one
> of those tests that requires 9p over virtio? I have a different environment
> for that.
We run the tests via vmtest tool: https://github.com/danobi/vmtest
This is essentially a qemu wrapper.
I am not familiar with its internals, but for sure it is using 9p.
On 2/10/25 3:12 AM, David Howells wrote:
> Ihor Solodrai <ihor.solodrai@...ux.dev> wrote:
>
>> Bash piece starting a process collecting /proc/fs/netfs/stats:
>>
>> function tail_netfs {
>> echo -n > /mnt/vmtest/netfs-stats.log
>> while true; do
>> echo >> /mnt/vmtest/netfs-stats.log
>> cat /proc/fs/netfs/stats >> /mnt/vmtest/netfs-stats.log
>> sleep 1
>> done
>> }
>> export -f tail_netfs
>> nohup bash -c 'tail_netfs' & disown
>
> I'm afraid, intermediate snapshots of this file aren't particularly useful -
> just the last snapshot:
The reason I wrote it like this is because the test runner hangs, and
so I have to kill qemu to stop it (with no ability to run
post-processing within qemu instance; well, at least I don't know how
to do it).
>
> [...]
>
> Could you collect some tracing:
>
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_read/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_write/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_write_iter/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_rreq/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_rreq_ref/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_sreq/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_sreq_ref/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_failure/enable
>
> and then collect the tracelog:
>
> trace-cmd show | bzip2 >some_file_somewhere.bz2
>
> And if you could collect /proc/fs/netfs/requests as well, that will show the
> debug IDs of the hanging requests. These can be used to grep the trace by
> prepending "R=". For example, if you see:
>
> REQUEST OR REF FL ERR OPS COVERAGE
> ======== == === == ==== === =========
> 00000043 WB 1 2120 0 0 @34000000 0/0
>
> then:
>
> trace-cmd show | grep R=00000043
Done. I pushed the logs to the previously mentioned github branch:
https://github.com/kernel-patches/bpf/commit/699a3bb95e2291d877737438fb641628702fd18f
Let me know if I can help with anything else.
>
> Thanks,
> David
>
Powered by blists - more mailing lists