lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 7 Dec 2023 10:58:04 -0500
From:   Genes Lists <lists@...ience.com>
To:     Guoqing Jiang <guoqing.jiang@...ux.dev>,
        Bagas Sanjaya <bagasdotme@...il.com>, snitzer@...nel.org,
        song@...nel.org, yukuai3@...wei.com, axboe@...nel.dk,
        mpatocka@...hat.com, heinzm@...hat.com,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux RAID <linux-raid@...r.kernel.org>,
        Linux Regressions <regressions@...ts.linux.dev>
Cc:     Bhanu Victor DiCara <00bvd0+linux@...il.com>,
        Xiao Ni <xni@...hat.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: md raid6 oops in 6.6.4 stable

On 12/7/23 09:42, Guoqing Jiang wrote:
> Hi,
> 
> On 12/7/23 21:55, Genes Lists wrote:
>> On 12/7/23 08:30, Bagas Sanjaya wrote:
>>> On Thu, Dec 07, 2023 at 08:10:04AM -0500, Genes Lists wrote:
>>>> I have not had chance to git bisect this but since it happened in 
>>>> stable I
>>>> thought it was important to share sooner than later.
>>>>
>>>> One possibly relevant commit between 6.6.3 and 6.6.4 could be:
>>>>
>>>>    commit 2c975b0b8b11f1ffb1ed538609e2c89d8abf800e
>>>>    Author: Song Liu <song@...nel.org>
>>>>    Date:   Fri Nov 17 15:56:30 2023 -0800
>>>>
>>>>      md: fix bi_status reporting in md_end_clone_io
>>>>
>>>> log attached shows page_fault_oops.
>>>> Machine was up for 3 days before crash happened.
> 
> Could you decode the oops (I can't find it in lore for some reason) 
> ([1])? And
> can it be reproduced reliably? If so, pls share the reproduce step.
> 
> [1]. https://lwn.net/Articles/592724/
> 
> Thanks,
> Guoqing

   - reproducing
     An rsync runs 2 x / day. It copies to this server from another. The 
copy is from a (large) top level directory. On the 3rd day after booting 
6.6.4,  the second of these rysnc's triggered the oops. I need to do 
more testing to see if I can reliably reproduce. I have not seen this 
oops on earlier stable kernels.

   - decoding oops with scripts/decode_stacktrace.sh had errors :
    readelf: Error: Not an ELF file - it has the wrong magic bytes at 
the start

    It appears that the decode script doesn't handle compressed modules. 
  I changed the readelf line to decompress first. This fixes the above 
script complaint and the result is attached.

gene






View attachment "raid6-stacktrace" of type "text/plain" (5283 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ