lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 29 May 2020 21:03:17 +0200
From:   Mark Marshall <markmarshall14@...il.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     linux-rt-users <linux-rt-users@...r.kernel.org>,
        Mark Marshall <mark.marshall@...cronenergy.com>,
        thomas.graziadei@...cronenergy.com,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org, rostedt@...dmis.org
Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and
 PowerPC e500

My config is attached.  This is the greatly reduced config that I used
when trying to narrow down the problem.  We normally have much more
enabled, but that had no effect on the bug in my testing.  We do,
unfortunately, have quite a few out-of-tree patches, but they are all
in USB or Networking, which are disabled here.

I've never tried out the kernel under qemu, but I will try that next
week to see if I can reproduce the problem there.  It's certainly
quite a narrow race window though, so it might behave quite
differently under qemu.  In general, how reliable is qemu at showing
these kinds of problems?

Thanks,
Mark

PS.
I've also noticed that THREAD_SHIFT is set in this config.  That's
because when I added lots of debug options, I got warnings about the
stack being too small.  This had no impact on the bug that I had, I
increased the size of the stack, and the stack warnings stopped, but
the bug was still the same.

On Fri, 29 May 2020 at 18:15, Sebastian Andrzej Siewior
<bigeasy@...utronix.de> wrote:
>
> On 2020-05-29 17:38:39 [+0200], Mark Marshall wrote:
> > Hi Sebastian & list,
> Hi,
>
> > I had assumed that my e-mail had got lost or overlooked, I was meaning to
> > post a follow up message this week...
> >
> > All I could find from the debugging and tracing that we added was that
> > something was going wrong with the mm data structures somewhere in the
> > exec code.  In the end I just spent a week or two pouring over the diffs
> > of this code between the versions that I new worked and didn't work.
> >
> > I eventually found the culprit.  On the working kernel versions there is
> > a patch called "mm: Protect activate_mm() by preempt_[disable&enable]_rt()".
> > This is commit f0b4a9cb253a on the V4.19.82-rt30 branch, for instance.
> > Although the commit message talks about ARM, it seems that we need this for
> > PowerPC too (I guess, any PowerPC with the "nohash" MMU?).
>
> Could you drop me your config, please? I need to dig here a little and I
> should have seen this on qemu, right?
>
> > Could you please add this commit back to the RT branch?  I'm not sure how
> > to find out the history of this commit.  For instance, why has it been
> > removed from the RT patchset?  How are these things tracked, generally?
>
> I dropped that patch in v5.4.3-rt1. I couldn't reproduce the issue that
> was documented in the patch and the code that triggered the warning was
> removed / reworked in commit
>     b5466f8728527 ("ARM: mm: remove IPI broadcasting on ASID rollover")
>
> So it looked like no longer needed and then got dropped during the
> rebase.
> In order to get it back into the RT queue I need to understand why it is
> required. What exactly is it fixing. Let me stare at for a littleā€¦
>
> > Best regards,
> > Mark
>
> Sebastian

Download attachment "config-5.4-rt" of type "application/octet-stream" (5142 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ