linux-kernel - Re: [PATCH v6 1/1] MIPS: Fix idle VS timer enqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAofZF52_yKcpd+GBE9ygggeNTOVQDP7AKau5xZE+N4fHGCgSQ@mail.gmail.com>
Date: Wed, 26 Mar 2025 10:46:08 +0100
From: Marco Crivellari <marco.crivellari@...e.com>
To: Huacai Chen <chenhuacai@...nel.org>
Cc: "Maciej W. Rozycki" <macro@...am.me.uk>, linux-mips@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Thomas Bogendoerfer <tsbogend@...ha.franken.de>, 
	Frederic Weisbecker <frederic@...nel.org>, Anna-Maria Behnsen <anna-maria@...utronix.de>, 
	Thomas Gleixner <tglx@...utronix.de>, Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v6 1/1] MIPS: Fix idle VS timer enqueue

I'm mostly thinking about future changes in this part of the code.
But if it is ok what we have now, and future changes are not a
problem, let's keep this version.

Would this be ok with you @Maciej?

If so, the region is now 40 bytes. This is what I did yesterday:

@@ -110,6 +110,7 @@ LEAF(__r4k_wait)
       .set    noreorder
       /* Start of idle interrupt region. */
       MFC0    t0, CP0_STATUS
+       nop
       /* Enable interrupt. */
       ori     t0, 0x1f
       xori    t0, 0x1e
@@ -128,7 +129,11 @@ LEAF(__r4k_wait)
        */
       wait
       /* End of idle interrupt region. */
-1:
+__r4k_wait_exit:
+       /* Check idle interrupt region size. */
+       .if ((__r4k_wait_exit - __r4k_wait) != 40)
+       .error  "Idle interrupt region size mismatch: expected 40 bytes."
+       .endif
       jr      ra
        nop
       .set    pop
@@ -139,10 +144,10 @@ LEAF(__r4k_wait)
       .set    push
       .set    noat
       MFC0    k0, CP0_EPC
-       PTR_LA  k1, 1b
-       /* 36 byte idle interrupt region. */
+       PTR_LA  k1, __r4k_wait_exit
+       /* 40 byte idle interrupt region. */
       ori     k0, 0x1f
-       PTR_ADDIU       k0, 5
+       PTR_ADDIU       k0, 9
       bne     k0, k1, \handler
       MTC0    k0, CP0_EPC
       .set pop

Under QEMU is working, but I'm not so sure the latest part is correct.

Thanks!



On Wed, Mar 26, 2025 at 2:20 AM Huacai Chen <chenhuacai@...nel.org> wrote:
>
> On Tue, Mar 25, 2025 at 10:09 PM Marco Crivellari
> <marco.crivellari@...e.com> wrote:
> >
> > Hi Maciej,
> >
> > Thanks a lot for all the information.
> >
> > >  Unlike `__r4k_wait' that code is not in a `.set noreorder' region and
> > > the assembler will therefore resolve the hazard by inserting a NOP where
> > > required by the architecture level requested (with `-march=...', etc.).
> > > Try compiling that function for a MIPS III configuration such as
> > > decstation_r4k_defconfig or just by hand with `-march=mips3' and see
> > > what machine code is produced.
> >
> > I tried with the configuration you suggested, and indeed I can see a NOP.
> >
> > >  Whatever manual you quote it refers to MIPS Release 2, which is only
> > > dated 2003
> >
> > About the MIPS manual, anyhow, it is "MIPS32 M4 Processor Core" (year 2008).
> > Maybe I've also picked the wrong manual.
> >
> > I've also found the manual you mentioned online, thanks.
> >
> > >  Best is to avoid using a `.set noreorder' region in the first place.
> > > But is it really needed here?  Does the rollback area have to be of a
> > > hardcoded size rather than one calculated by the assembler based on
> > > actual machine code produced?  It seems to me having it calculated would
> > > reduce complexity here and let us use the EI instruction where available
> > > as well.
> >
> > Well, considering the complexity and how the code looks fragile even with
> > a small change, yes, that's likely better to avoid noreorder.
> In my opinion keeping "noreorder" is the simplest, which means just
> add an "nop" after MFC0 in the current version.
>
> Huacai
>
> >
> > I think I'm going to need some guidance here.
> > Please, correct me where something is wrong.
> >
> > 1)
> > When you say "let us use the EI instruction where available" are you
> > referring to do
> > something like below?
> >
> > #if defined(CONFIG_CPU_HAS_DIEI)
> > ei
> > #else
> > MFC0    t0, CP0_STATUS
> > ori     t0, 0x1f
> > xori    t0, 0x1e
> > MTC0    t0, CP0_STATUS
> > #endif
> >
> > 2)
> > Removing "noreorder" would let the compiler add "nops" where they are needed.
> > But that still means the 3 ssnop and ehb are still needed, right?
> >
> > My subsequent dumb question is: there is the guarantee that the
> > compiler will not
> > reorder / change something we did?
> > This question also came after reading the manual you quoted (paragraph
> > "Coprocessor Hazards"):
> >
> > "For example, after an mtc0 to the Status register which changes an
> > interrupt mask bit,
> > there will be two further instructions before the interrupt is
> > actually enabled or disabled.
> > [...]
> > To cope with these situations usually requires the programmer to take explicit
> > action to prevent the assembler from scheduling inappropriate
> > instructions after a
> > dangerous mtc0. This is done by using the .set noreorder directive,
> > discussed below"
> >
> > 3)
> > Considering the size is determined by the compiler, the check about
> > the idle interrupt
> > size region should not be needed, correct?
> >
> > 4)
> > ori and PTR_ADDIU should be removed of course from the rollback handler macro.
> > Can I have some hints about the needed change?
> > Using QEMU is always working, so I'm not sure if what I will change is
> > also correct.
> >
> >
> > Many thanks in advance, also for your time!
> >
> >
> >
> >
> > On Fri, Mar 21, 2025 at 9:11 PM Maciej W. Rozycki <macro@...am.me.uk> wrote:
> > >
> > > On Fri, 21 Mar 2025, Marco Crivellari wrote:
> > >
> > > > >  This instruction sequence still suffers from the coprocessor move delay
> > > > > hazard.  How many times do I need to request to get it fixed (counting
> > > > > three so far)?
> > > >
> > > > Can I have more details about this?
> > > >
> > > > I can see it is the same code present also in local_irq_enable()
> > > > (arch_local_irq_enable()),
> > >
> > >  Unlike `__r4k_wait' that code is not in a `.set noreorder' region and
> > > the assembler will therefore resolve the hazard by inserting a NOP where
> > > required by the architecture level requested (with `-march=...', etc.).
> > > Try compiling that function for a MIPS III configuration such as
> > > decstation_r4k_defconfig or just by hand with `-march=mips3' and see
> > > what machine code is produced.
> > >
> > > > and from the manual I've seen:
> > > >
> > > > "The Spacing column shown in Table 2.6 and Table 2.7 indicates the
> > > > number of unrelated instructions (such as NOPs or SSNOPs) that,
> > > > prior to the capabilities of Release 2, would need to be placed
> > > > between the producer and consumer of the hazard in order to ensure
> > > > that the effects of the first instruction are seen by the second instruction."
> > > >
> > > > The "Spacing column" value is 3, indeed.
> > > >
> > > > "With the hazard elimination instructions available in Release 2, the
> > > > preferred method to eliminate hazards is to place one of the
> > > > instructions listed in Table 2.8 between the producer and consumer of the
> > > > hazard. Execution hazards can be removed by using the EHB [...]"
> > >
> > >  Whatever manual you quote it refers to MIPS Release 2, which is only
> > > dated 2003 (following Release 1 from 2001), but `__r4k_wait' has to
> > > continue handling older architecture revisions going back to MIPS III
> > > ISA from 1991.  We also support MIPS I ISA from 1985 which has further
> > > instruction scheduling requirements, but `__r4k_wait' is for newer
> > > processors only, because older ones had no WAIT instruction, so we can
> > > ignore them (but note that the MIPS I load delay slot is regardless
> > > observed in current code in the form of a NOP inserted after LONG_L).
> > >
> > > > What am I missing?
> > >
> > >  This is common MIPS knowledge really, encoded in the GNU toolchain and
> > > especially GAS since forever.  While I can't cite a canonical reference,
> > > the hazard is listed e.g. in Table 13.1 "Instructions with scheduling
> > > implications" and Table 13.3 "R4xxx/R5000 Coprocessor 0 Hazards" from
> > > "IDT MIPS Microprocessor Family Software Reference Manual," Version 2.0,
> > > from October 1996.  I do believe the document is available online.
> > >
> > >  I'm fairly sure the hazard is also listed there in Dominic Sweetman's
> > > "See MIPS Run Linux," but I don't have my copy handy right now.
> > >
> > >  Best is to avoid using a `.set noreorder' region in the first place.
> > > But is it really needed here?  Does the rollback area have to be of a
> > > hardcoded size rather than one calculated by the assembler based on
> > > actual machine code produced?  It seems to me having it calculated would
> > > reduce complexity here and let us use the EI instruction where available
> > > as well.
> > >
> > >  HTH,
> > >
> > >   Maciej
> >
> >
> >
> > --
> >
> > Marco Crivellari
> >
> > L3 Support Engineer, Technology & Product
> >
> >
> >
> >
> > marco.crivellari@...e.com



--

Marco Crivellari

L3 Support Engineer, Technology & Product




marco.crivellari@...e.com