lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+wXwBQmusp49-b6cU-hPAoOnpTiuvA2QrjaOSyb-EvKigC_Ug@mail.gmail.com>
Date:   Mon, 24 Jul 2023 23:04:13 +0100
From:   Daniel Dao <dqminh@...udflare.com>
To:     Dave Chinner <david@...morbit.com>
Cc:     linux-fsdevel@...r.kernel.org,
        Matthew Wilcox <willy@...radead.org>,
        kernel-team <kernel-team@...udflare.com>,
        linux-kernel <linux-kernel@...r.kernel.org>, djwong@...nel.org
Subject: Re: Kernel NULL pointer deref and data corruptions with xfs on 6.1

On Mon, Jul 24, 2023 at 10:45 PM Dave Chinner <david@...morbit.com> wrote:
>
> On Mon, Jul 24, 2023 at 12:23:31PM +0100, Daniel Dao wrote:
> > Hi again,
> >
> > We had another example of xarray corruption involving xfs and zsmalloc. We are
> > running zram as swap. We have 2 tasks deadlock waiting for page to be released
>
> Do your problems on 6.1 go away if you stop using zram as swap?

We had xarray corruptions even on nodes without swap, so I'm not sure
if swap matters.
The corruption on those nodes were noted in the first email with the
following trace

 BUG: kernel NULL pointer dereference, address: 0000000000000036
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 18806c5067 P4D 18806c5067 PUD 188ed48067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 73 PID: 3579408 Comm: prometheus Tainted: G           O
6.1.34-cloudflare-2023.6.7 #1
    Hardware name: GIGABYTE R162-Z12-CD1/MZ12-HD4-CD, BIOS M03 11/19/2021
    RIP: 0010:__filemap_get_folio (arch/x86/include/asm/atomic.h:29
include/linux/atomic/atomic-arch-fallback.h:1242
include/linux/atomic/atomic-arch-fallback.h:1267
include/linux/atomic/atomic-instrumented.h:608
include/linux/page_ref.h:238 include/linux/page_ref.h:247
include/linux/page_ref.h:280 include/linux/page_ref.h:313
mm/filemap.c:1863 mm/filemap.c:1915)

It's hard for us to run tests without zram swap at scale since the
benefits are significant with a lot of
workloads.

Daniel.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ