[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.2511121046350.25436@angie.orcam.me.uk>
Date: Wed, 12 Nov 2025 12:16:28 +0000 (GMT)
From: "Maciej W. Rozycki" <macro@...am.me.uk>
To: Thomas Bogendoerfer <tsbogend@...ha.franken.de>
cc: Nick Bowler <nbowler@...conx.ca>, Jiaxun Yang <jiaxun.yang@...goat.com>,
linux-mips@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] MIPS: mm: Prevent a TLB shutdown on initial
uniquification
On Wed, 12 Nov 2025, Thomas Bogendoerfer wrote:
> > Can you try the diagnostic patch below, which is what I used to verify
> > this change, and report the entries produced? Otherwise I wonder whether
> > I haven't missed a barrier somewhere.
>
> Update on the issue: Your patch is good and the segmentation faults,
> I'm seeing, have IMHO a different reason. Instead of removing the call
> to r4k_tlb_uniquify() I've replaced the jal in the binary with a nop.
> And the issue is still there with this patched kernel. I've seen
> something similair on a R12k Octanes, which comes and goes probably
> depeding on code layout. So far I wasn't able to nail this down :-(
Oh dear! Something to do with the cache? Or code alignment perhaps?
It reminds me of this stuff:
<https://lore.kernel.org/r/Pine.GSO.3.96.1010625125007.20469D-100000@delta.ds2.pg.gda.pl/>.
Building a particular version of binutils freezed the machine solid ~11h
into the build -- a power cycle was required (there's no hardware reset
button). At least it was fully reproducible and always at the same place
in a `configure' script and changing the shell script in a trivial way,
such as adding a new-line character, ahead of the place of the lock-up
made the freeze go away.
I used the machine's 8-position diagnostic LED display to debug this, by
making it show the syscall and hardware interrupt numbers as the exception
handlers were entered, so as to narrow the origin down (only to realise
later on I could have used a 1MiB NVRAM module the system has to store
more data across a power cycle and retrieve it afterwards, a persistent
kernel log of sorts). IIRC it triggered in the exit(2) path.
The most painful was the need to wait said ~11h for the next piece of
data in debugging this.
NB the machine in question is still alive in my lab. Throwing memory SBE
ECC errors again recently, but coping regardless, so more memory connector
cleaning required upon next visit.
> Do you want to send a v2 of the patch ? I'm fine with the current version
> for applying...
I'll send v2 with an update for the Wired register as we talked. It may
take a day or two.
Maciej
Powered by blists - more mailing lists