[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKEwX=N-10A=C_Cp_m8yxfeTigvmZp1v7TrphcrHuRkHJ8837g@mail.gmail.com>
Date: Fri, 23 Aug 2024 12:16:53 -0400
From: Nhat Pham <nphamcs@...il.com>
To: Piotr Oniszczuk <piotr.oniszczuk@...il.com>
Cc: Matthew Wilcox <willy@...radead.org>,
Linux regressions mailing list <regressions@...ts.linux.dev>, LKML <linux-kernel@...r.kernel.org>,
Johannes Weiner <hannes@...xchg.org>, Yosry Ahmed <yosryahmed@...gle.com>,
Linux-MM <linux-mm@...ck.org>
Subject: Re: [regression] oops on heavy compilations ("kernel BUG at
mm/zswap.c:1005!" and "Oops: invalid opcode: 0000")
On Fri, Aug 23, 2024 at 11:07 AM Piotr Oniszczuk
<piotr.oniszczuk@...il.com> wrote:
>
>
>
> > Wiadomość napisana przez Matthew Wilcox <willy@...radead.org> w dniu 23.08.2024, o godz. 15:13:
> >
> > I wouldn't be surprised if this were dodgy ram.
>
>
> Well - that was my initial hypothesis.
>
> in fact i had few of them. Ranked (and ordered) like this:
> 1. downstream kernel patches
> 2. hw (ram) issue
> 3. kernel bug
>
> So full history was:
> -build myself archlinux 6.10.2 kernel; upgrade builder OS (only kernel; nothing else)
> -run normal devel process and (to my surprise) discover interrupted CI/CD builds by kernel oops
> -downgrade to 6.8.2 and done 4 full builds (full takes 8..9h of constant 12c/24/t compile). all good.
> -prepare vanilla 6.10.6 (to exclude potential downstream (ArchLinux) root causes)
> -run normal devel process and still discover oops
> -make sure hw is ok by week of test with 6.8.2 (recompiling for 3 architectures on 4 OS (3 in kvm). This was almost 5 full days of 12c/24 compiling. All good
> -because last steep was all good - decide to go to you :-)
>
> sure - this is possible that 6.8.2 had luck with my ram and 6.10.6 had no luck….but i personally don’t believe this is a case….
Have you tried with 6.9 yet? IIRC, there are two major changes to
zswap architecture in recent versions.
1. In 6.9, we range-partition zswap's rbtrees to reduce lock contention.
2. In 6.10, we replace zswap's rbtrees with xarrays.
If 6.9 is fine, then the latter is the suspect, and vice versa. Of
course, the minor changes are still suspect - but you get the idea :)
>
> btw: we can go with elimination strategy.
> So what i need to change/disable to be closer to finding root cause?
Could you let me know more about the setup? A couple things come to my mind:
1. zswap configs (allocator - is it zsmalloc? compressor?)
2. Is mTHP enabled? mTHP swapout was merged in 6.10, and there seems
to be some conflicts with zswap, but Yosry will know more about this
than me...
3. Is there any proprietary driver etc.?
> swap?
> now it is swapfile on system nvme
>
Powered by blists - more mailing lists