linux-kernel - Re: [regression] oops on heavy compilations ("kernel BUG at mm/zswap.c:1005!" and "Oops: invalid opcode: 0000")

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPpoddere2g=kkMzrxuJ1KCG=0Hg1-1v=ppg4dON9wK=pKq2uQ@mail.gmail.com>
Date: Sat, 24 Aug 2024 03:42:47 +0900
From: Takero Funaki <flintglass@...il.com>
To: Piotr Oniszczuk <piotr.oniszczuk@...il.com>
Cc: Matthew Wilcox <willy@...radead.org>, 
	Linux regressions mailing list <regressions@...ts.linux.dev>, LKML <linux-kernel@...r.kernel.org>, 
	Johannes Weiner <hannes@...xchg.org>, Yosry Ahmed <yosryahmed@...gle.com>, 
	Nhat Pham <nphamcs@...il.com>, Linux-MM <linux-mm@...ck.org>
Subject: Re: [regression] oops on heavy compilations ("kernel BUG at
 mm/zswap.c:1005!" and "Oops: invalid opcode: 0000")

2024年8月24日(土) 0:07 Piotr Oniszczuk <piotr.oniszczuk@...il.com>:
>
>
>
> > Wiadomość napisana przez Matthew Wilcox <willy@...radead.org> w dniu 23.08.2024, o godz. 15:13:
> >
> > I wouldn't be surprised if this were dodgy ram.
>
>
> Well - that was my initial hypothesis.
>
> in fact i had few of them. Ranked (and ordered) like this:
> 1. downstream kernel patches
> 2. hw (ram) issue
> 3. kernel bug
>
> So full history was:
> -build myself archlinux 6.10.2 kernel; upgrade builder OS (only kernel; nothing else)
> -run normal devel process and (to my surprise) discover interrupted CI/CD builds by kernel oops
> -downgrade to 6.8.2 and done 4 full builds (full takes 8..9h of constant 12c/24/t compile). all good.
> -prepare vanilla 6.10.6 (to exclude potential downstream (ArchLinux) root causes)
> -run normal devel process and still discover oops
> -make sure hw is ok by week of test with 6.8.2 (recompiling for 3 architectures on 4 OS (3 in kvm). This was almost 5 full days of 12c/24 compiling. All good
> -because last steep was all good - decide to go to you :-)
>
> sure - this is possible that 6.8.2 had luck with my ram and 6.10.6 had no luck….but i personally don’t believe this is a case….
>
> btw: we can go with elimination strategy.
> So what i need to change/disable to be closer to finding root cause?
> swap?
> now it is swapfile on system nvme
>
>

Hello,

I’m encountering a similar crash and trace in the issue I posted on Bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=219154

If this is the same issue caused by virtio_net corrupting memory, you
should be able to reproduce the crash by sending data to the VM over
virtio interface while it is actively allocating memory (e.g., using
iperf3 -s on the VM and running iperf -c from another host).

In my case, as Thorsten suggested, reverting bisected commit
f9dac92ba908 (virtio_ring: enable premapped mode regardless of
use_dma_api) along with two related commits in this series resolved
the issue:
https://lore.kernel.org/all/7774ac707743ad8ce3afeacbd4bee63ac96dd927.1723617902.git.mst@redhat.com/