lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 26 Dec 2022 08:14:55 +0800
From:   Qu Wenruo <quwenruo.btrfs@....com>
To:     Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>, wqu@...e.com,
        dsterba@...e.com, Btrfs BTRFS <linux-btrfs@...r.kernel.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>
Subject: Re: [6.2][regression] after commit
 947a629988f191807d2d22ba63ae18259bb645c5 btrfs volume periodical forced
 switch to readonly after a lot of disk writes



On 2022/12/26 05:32, Mikhail Gavrilov wrote:
> Hi,
> It is curious but it happens only on machine which have BTRFS volume
> combined from two high speed nvme (pcie 4) SSD in RAID 0. On machines
> with BTRFS volume from one HDD the bug does not appear.
> 
> To bisect the problematic commit, I had to sweat a lot. At each step,
> I downloaded the 150 GB game "Assassin's Creed Valhalla" 4 times and
> deleted it. For make sure that the commit previous to
> 947a629988f191807d2d22ba63ae18259bb645c5 is definitely not affected by
> the bug, I downloaded this game 10 times, which should have provided
> more than 1.5 Tb of data writing to the btrfs volume.
> 
> Here is result of my bisection:
> 947a629988f191807d2d22ba63ae18259bb645c5 is the first bad commit
> commit 947a629988f191807d2d22ba63ae18259bb645c5
> Author: Qu Wenruo <wqu@...e.com>
> Date:   Wed Sep 14 13:32:51 2022 +0800
> 
>      btrfs: move tree block parentness check into validate_extent_buffer()
> 
[...]
>      Signed-off-by: Qu Wenruo <wqu@...e.com>
>      Signed-off-by: David Sterba <dsterba@...e.com>
> 
>   fs/btrfs/disk-io.c   | 73 ++++++++++++++++++++++++++++++++++++++--------------
>   fs/btrfs/extent_io.c | 18 ++++++++++---
>   fs/btrfs/extent_io.h |  5 ++--
>   fs/btrfs/volumes.h   | 25 +++++++++++++++---
>   4 files changed, 93 insertions(+), 28 deletions(-)
> 
> Before going to readonly, the preceding line in kernel log display a message:
> [ 1908.029663] BTRFS: error (device nvme0n1p3: state A) in
> btrfs_run_delayed_refs:2147: errno=-5 IO failure
> 
> I also attached a full kernel log.
> 
Thanks a lot for the full kernel log.

It indeed shows something is wrong in the run_one_delayed_ref().
But surprisingly, if there is something wrong, I'd expect more output 
from btrfs, as normally if one tree block failed to pass whatever the 
checks, it should cause an error message at least.

Since you can reproduce the bug (although I don't think this is easy to 
reproduce), mind to apply the extra debug patch and then try to reproduce?

(Part of the patch would go upstreamed soon)

Another thing is, mind to run "btrfs check --readonly" on the fs?
I don't believe that's the case, but just in case.

Thanks,
Qu
View attachment "0001-btrfs-add-extra-debug-for-run_one_delayed_ref.patch" of type "text/x-patch" (3096 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ