[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABXGCsMev+96zixic3=5cLgzMfZKcJE-zOLPrc3-vTrrDZQDig@mail.gmail.com>
Date: Wed, 28 Dec 2022 19:12:48 +0500
From: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
To: Qu Wenruo <quwenruo.btrfs@....com>
Cc: Qu Wenruo <wqu@...e.com>, dsterba@...e.com,
Btrfs BTRFS <linux-btrfs@...r.kernel.org>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>
Subject: Re: [6.2][regression] after commit 947a629988f191807d2d22ba63ae18259bb645c5
btrfs volume periodical forced switch to readonly after a lot of disk writes
On Wed, Dec 28, 2022 at 6:08 AM Qu Wenruo <quwenruo.btrfs@....com> wrote:
>
> From the very first dmesg with calltrack, it already shows the
> submit_one_bio() is called from submit_extent_page(), which means cases
> cross stripe boundary, and has no parent_check populated at all.
>
> And since you're using RAID0 on two NVMEs, it matches the symptom, while
> most tests done here are using single device (DUP and SINGLE), thus no
> stripe boundary cases at all.
> (In fact it should still be possible to trigger on SINGLE, but way too
> hard to trigger)
>
> With proper root cause found, this version should mostly handle the
> regression correctly.
>
> This version should mostly be the formal one I'd later send to the
> mailing list.
>
> I can not thank you more for all the testing you have provided, it not
> only pinned down the bug, but also proves I'm a total idiot...
I have already written over 1.6Tb of data to disk and there are no
hints of errors.
For me, this is a sign that the problem has been fixed.
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
❯ dmesg | grep -i btrfs
[ 0.000000] Linux version
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
(mikhail@...hail-laptop) (gcc (GCC) 12.2.1 20221121 (Red Hat
12.2.1-4), GNU ld version 2.39-6.fc38) #7 SMP PREEMPT_DYNAMIC Wed Dec
28 10:00:39 +05 2022
[ 0.000000] Command line:
BOOT_IMAGE=(hd0,gpt1)/@...t/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
root=UUID=40e0b5d2-df54-46e0-b6f4-2f868296271d ro
rootflags=subvol=@...t
resume=UUID=db79988f-6b70-4b52-84f5-3e505471c85e log_buf_len=16M
sysrq_always_enabled=1 nmi_watchdog=1
amdgpu.lockup_timeout=-1,-1,-1,-1
[ 0.154567] Kernel command line:
BOOT_IMAGE=(hd0,gpt1)/@...t/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
root=UUID=40e0b5d2-df54-46e0-b6f4-2f868296271d ro
rootflags=subvol=@...t
resume=UUID=db79988f-6b70-4b52-84f5-3e505471c85e log_buf_len=16M
sysrq_always_enabled=1 nmi_watchdog=1
amdgpu.lockup_timeout=-1,-1,-1,-1
[ 0.154654] Unknown kernel command line parameters
"BOOT_IMAGE=(hd0,gpt1)/@...t/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+",
will be passed to user space.
[ 4.496766] usb usb2: Manufacturer: Linux
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
[ 4.498963] usb usb1: Manufacturer: Linux
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
[ 4.500665] usb usb3: Manufacturer: Linux
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
[ 4.501851] usb usb4: Manufacturer: Linux
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
[ 4.735212] Btrfs loaded, crc32c=crc32c-generic, assert=on,
zoned=yes, fsverity=yes
[ 5.223368]
BOOT_IMAGE=(hd0,gpt1)/@...t/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
[ 6.923453] BTRFS: device label fedora_localhost-live devid 2
transid 652981 /dev/nvme1n1p1 scanned by systemd-udevd (448)
[ 6.974412] BTRFS: device label fedora_localhost-live devid 1
transid 652981 /dev/nvme0n1p3 scanned by systemd-udevd (484)
[ 11.113437] CPU: 15 PID: 478 Comm: systemd-udevd Tainted: G
L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
#7
[ 11.221359] CPU: 15 PID: 478 Comm: systemd-udevd Tainted: G
W L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
#7
[ 13.731015] BTRFS info (device nvme0n1p3): using crc32c
(crc32c-intel) checksum algorithm
[ 13.731147] BTRFS info (device nvme0n1p3): using free space tree
[ 14.328439] BTRFS info (device nvme0n1p3): enabling ssd optimizations
[ 14.328469] BTRFS info (device nvme0n1p3): auto enabling async discard
[ 16.592713] BTRFS info (device nvme0n1p3: state M): use zstd
compression, level 1
[11691.071176] CPU: 11 PID: 2068 Comm: gnome-shell Tainted: G W
L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
#7
<OFFTOPIC>
As I mentioned at the first message I also have a computer where the
btrfs partition is located on a slow HDD.
When I update the container (podman pull), the system becomes
unresposible for half an hour, which is how long it takes to update
the container.
I do not expect any super-speed from the HDD, I just would like to do
something else with this computer. Yes, at least watching videos on
youtube. Is there anything that can be done here or is there nothing
that we can do?
[46944.301588] INFO: task btrfs-transacti:1184 blocked for more than
122 seconds.
[46944.301825] Tainted: G W L ------- ---
6.2.0-0.rc1.14.fc38.x86_64+debug #1
[46944.301829] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[46944.301832] task:btrfs-transacti state:D stack:12000 pid:1184
ppid:2 flags:0x00004000
[46944.301840] Call Trace:
[46944.301843] <TASK>
[46944.301851] __schedule+0x50c/0x1780
[46944.301863] ? _raw_spin_unlock_irqrestore+0x30/0x60
[46944.301876] schedule+0x5d/0xe0
[46944.301881] wait_current_trans+0x110/0x170
[46944.301888] ? __pfx_autoremove_wake_function+0x10/0x10
[46944.301895] start_transaction+0x36c/0x680
[46944.301904] transaction_kthread+0xb6/0x1b0
[46944.301912] ? __pfx_transaction_kthread+0x10/0x10
[46944.301916] kthread+0xf5/0x120
[46944.301920] ? __pfx_kthread+0x10/0x10
[46944.301926] ret_from_fork+0x2c/0x50
[46944.301941] </TASK>
I attached a full kernel log from this machine.
I can start a separate thread if it makes sense.
Sorry for oftop.
</OFFTOPIC>
--
Best Regards,
Mike Gavrilov.
View attachment "dmesg-btrfs-on-hdd.txt" of type "text/plain" (303399 bytes)
Powered by blists - more mailing lists