[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD4b4WLk0E92kBTk-VR7pKbfWwKgB9+h1Qq+DxgF7p-BPofC6A@mail.gmail.com>
Date: Fri, 29 May 2020 21:03:17 +0200
From: Mark Marshall <markmarshall14@...il.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: linux-rt-users <linux-rt-users@...r.kernel.org>,
Mark Marshall <mark.marshall@...cronenergy.com>,
thomas.graziadei@...cronenergy.com,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org, rostedt@...dmis.org
Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and
PowerPC e500
My config is attached. This is the greatly reduced config that I used
when trying to narrow down the problem. We normally have much more
enabled, but that had no effect on the bug in my testing. We do,
unfortunately, have quite a few out-of-tree patches, but they are all
in USB or Networking, which are disabled here.
I've never tried out the kernel under qemu, but I will try that next
week to see if I can reproduce the problem there. It's certainly
quite a narrow race window though, so it might behave quite
differently under qemu. In general, how reliable is qemu at showing
these kinds of problems?
Thanks,
Mark
PS.
I've also noticed that THREAD_SHIFT is set in this config. That's
because when I added lots of debug options, I got warnings about the
stack being too small. This had no impact on the bug that I had, I
increased the size of the stack, and the stack warnings stopped, but
the bug was still the same.
On Fri, 29 May 2020 at 18:15, Sebastian Andrzej Siewior
<bigeasy@...utronix.de> wrote:
>
> On 2020-05-29 17:38:39 [+0200], Mark Marshall wrote:
> > Hi Sebastian & list,
> Hi,
>
> > I had assumed that my e-mail had got lost or overlooked, I was meaning to
> > post a follow up message this week...
> >
> > All I could find from the debugging and tracing that we added was that
> > something was going wrong with the mm data structures somewhere in the
> > exec code. In the end I just spent a week or two pouring over the diffs
> > of this code between the versions that I new worked and didn't work.
> >
> > I eventually found the culprit. On the working kernel versions there is
> > a patch called "mm: Protect activate_mm() by preempt_[disable&enable]_rt()".
> > This is commit f0b4a9cb253a on the V4.19.82-rt30 branch, for instance.
> > Although the commit message talks about ARM, it seems that we need this for
> > PowerPC too (I guess, any PowerPC with the "nohash" MMU?).
>
> Could you drop me your config, please? I need to dig here a little and I
> should have seen this on qemu, right?
>
> > Could you please add this commit back to the RT branch? I'm not sure how
> > to find out the history of this commit. For instance, why has it been
> > removed from the RT patchset? How are these things tracked, generally?
>
> I dropped that patch in v5.4.3-rt1. I couldn't reproduce the issue that
> was documented in the patch and the code that triggered the warning was
> removed / reworked in commit
> b5466f8728527 ("ARM: mm: remove IPI broadcasting on ASID rollover")
>
> So it looked like no longer needed and then got dropped during the
> rebase.
> In order to get it back into the RT queue I need to understand why it is
> required. What exactly is it fixing. Let me stare at for a littleā¦
>
> > Best regards,
> > Mark
>
> Sebastian
Download attachment "config-5.4-rt" of type "application/octet-stream" (5142 bytes)
Powered by blists - more mailing lists