[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f68699c3-ec5e-d8e8-f101-6e9a7020ac81@gmx.com>
Date: Mon, 26 Dec 2022 08:14:55 +0800
From: Qu Wenruo <quwenruo.btrfs@....com>
To: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>, wqu@...e.com,
dsterba@...e.com, Btrfs BTRFS <linux-btrfs@...r.kernel.org>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>
Subject: Re: [6.2][regression] after commit
947a629988f191807d2d22ba63ae18259bb645c5 btrfs volume periodical forced
switch to readonly after a lot of disk writes
On 2022/12/26 05:32, Mikhail Gavrilov wrote:
> Hi,
> It is curious but it happens only on machine which have BTRFS volume
> combined from two high speed nvme (pcie 4) SSD in RAID 0. On machines
> with BTRFS volume from one HDD the bug does not appear.
>
> To bisect the problematic commit, I had to sweat a lot. At each step,
> I downloaded the 150 GB game "Assassin's Creed Valhalla" 4 times and
> deleted it. For make sure that the commit previous to
> 947a629988f191807d2d22ba63ae18259bb645c5 is definitely not affected by
> the bug, I downloaded this game 10 times, which should have provided
> more than 1.5 Tb of data writing to the btrfs volume.
>
> Here is result of my bisection:
> 947a629988f191807d2d22ba63ae18259bb645c5 is the first bad commit
> commit 947a629988f191807d2d22ba63ae18259bb645c5
> Author: Qu Wenruo <wqu@...e.com>
> Date: Wed Sep 14 13:32:51 2022 +0800
>
> btrfs: move tree block parentness check into validate_extent_buffer()
>
[...]
> Signed-off-by: Qu Wenruo <wqu@...e.com>
> Signed-off-by: David Sterba <dsterba@...e.com>
>
> fs/btrfs/disk-io.c | 73 ++++++++++++++++++++++++++++++++++++++--------------
> fs/btrfs/extent_io.c | 18 ++++++++++---
> fs/btrfs/extent_io.h | 5 ++--
> fs/btrfs/volumes.h | 25 +++++++++++++++---
> 4 files changed, 93 insertions(+), 28 deletions(-)
>
> Before going to readonly, the preceding line in kernel log display a message:
> [ 1908.029663] BTRFS: error (device nvme0n1p3: state A) in
> btrfs_run_delayed_refs:2147: errno=-5 IO failure
>
> I also attached a full kernel log.
>
Thanks a lot for the full kernel log.
It indeed shows something is wrong in the run_one_delayed_ref().
But surprisingly, if there is something wrong, I'd expect more output
from btrfs, as normally if one tree block failed to pass whatever the
checks, it should cause an error message at least.
Since you can reproduce the bug (although I don't think this is easy to
reproduce), mind to apply the extra debug patch and then try to reproduce?
(Part of the patch would go upstreamed soon)
Another thing is, mind to run "btrfs check --readonly" on the fs?
I don't believe that's the case, but just in case.
Thanks,
Qu
View attachment "0001-btrfs-add-extra-debug-for-run_one_delayed_ref.patch" of type "text/x-patch" (3096 bytes)
Powered by blists - more mailing lists