lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 28 Dec 2022 09:08:14 +0800
From:   Qu Wenruo <quwenruo.btrfs@....com>
To:     Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>,
        Qu Wenruo <wqu@...e.com>
Cc:     dsterba@...e.com, Btrfs BTRFS <linux-btrfs@...r.kernel.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>
Subject: Re: [6.2][regression] after commit
 947a629988f191807d2d22ba63ae18259bb645c5 btrfs volume periodical forced
 switch to readonly after a lot of disk writes



On 2022/12/27 21:11, Mikhail Gavrilov wrote:
> On Tue, Dec 27, 2022 at 4:03 PM Qu Wenruo <wqu@...e.com> wrote:
>>
>> I have a similar laptop (G14), only GPU is different (RTX3060), and I
>> failed to reproduce this so far...
>>
>> My gcc is only a small version behind (12.2.0).
>>
>> Thus none of the hardware seems suspicious at all...
>>
>> Anyway I have attached my last struggle for the weird problem.
>> For now, I have no idea why this can even happen...
> 
> The new Kernel log is attached.
> This time, the main difference was that the file system did not
> immediately switch to readonly.
> The Steam client stopped a couple of times with a write error, but
> after pressing the resume button, it resumed downloading. For the
> third or fourth time refused to download.
> 
I'm a total idiot.

 From the very first dmesg with calltrack, it already shows the 
submit_one_bio() is called from submit_extent_page(), which means cases 
cross stripe boundary, and has no parent_check populated at all.

And since you're using RAID0 on two NVMEs, it matches the symptom, while 
most tests done here are using single device (DUP and SINGLE), thus no 
stripe boundary cases at all.
(In fact it should still be possible to trigger on SINGLE, but way too 
hard to trigger)

With proper root cause found, this version should mostly handle the 
regression correctly.

This version should mostly be the formal one I'd later send to the 
mailing list.

I can not thank you more for all the testing you have provided, it not 
only pinned down the bug, but also proves I'm a total idiot...

Thanks,
Qu
View attachment "0001-btrfs-fix-the-false-alert-on-bad-tree-level.patch" of type "text/x-patch" (5723 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ