[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAofZF65p+DnH8xA0+sfuZv=VO63Zgv4rQ6frrdEzQYoZ0MaWA@mail.gmail.com>
Date: Mon, 31 Mar 2025 12:11:02 +0200
From: Marco Crivellari <marco.crivellari@...e.com>
To: "Maciej W. Rozycki" <macro@...am.me.uk>
Cc: linux-mips@...r.kernel.org, linux-kernel@...r.kernel.org,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>, Frederic Weisbecker <frederic@...nel.org>,
Anna-Maria Behnsen <anna-maria@...utronix.de>, Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>, Huacai Chen <chenhuacai@...nel.org>
Subject: Re: [PATCH v6 1/1] MIPS: Fix idle VS timer enqueue
Hi Maciej,
>It has to be M4K there, and that isn't the most fortunate choice, because
>it's a manual for the specific microarchitecture and then one that doesn't
>usually run Linux, because it has no TLB.
Ah... my bad. I've also searched the manuals name you gave me.
I had one of them, but a specific volume:
"MIPS Architecture For Programmers Volume II-A: The MIPS64 Instruction
Set Reference Manual"
> There's some complication here coming from the need to factor in the ISA
> bit in the microMIPS mode; something that hasn't been discussed so far.
> The `.fill 0' approach is a hack and it has struck me that we need to add
> a `.noinsn' pseudo-op to GAS for this purpose, complementing `.insn', but
> we need to stick with the hack for now anyway as it will take years until
> we can rely on a new feature in the assembler.
Ah, interesting. So de facto having the "repeat" to 0, will generate nothing.
> NB don't refer to a local label from a macro as such a reference may end
> up pointing not where you want it to from the place the macro is pasted
> at. For example with your v6 code BUILD_ROLLBACK_PROLOGUE refers to the
> intended label in `__r4k_wait' when pasted for `handle_int', but then to
> the label in `except_vec4' instead when pasted for `except_vec_vi' further
> down.
> Technically it is correct and likely the original MIPSCO assembler from
> 1985 or one supplied with IRIX were smarter, but GAS won't itself ever
> reorder instructions other than to fill branch delay slots, so we don't
> have to be worried.
Thank you!
> I gave it some thought and concluded that the interrupt handling path has
> to be optimised for performance and the idle routine does not. Therefore
> I think we need to stick to the power-of-two size for the idle interrupt
> region, because in that case the check for an interrupt arriving within
>` r4k_wait' but ahead of the WAIT instruction can be done with a pair of
> ALU operations and a single branch. Anything more complex would require
> more operations in the interrupt handling path.
Clear, thanks.
> I can't imagine how we'd advance past WAIT without these instructions,
> what do you have in mind?
I've not been precise, sorry.
I meant to remove the instructions like they are now because the
region would have
been different, then.
So, those instructions would have needed a change, in practice.
> NB how do you actually verify this stuff with QEMU? Is it by injecting
> an interrupt by hand at a chosen code location via GDB attached to QEMU's
> built-in debug stub?
Short answer: I am not able to fully test this, I can only boot.
The reason is that gdb-multiarch is not working as expected.
The binary in my distribution has the python support broken. So when I try to
inject the interrupt, I'm receiving a python error (actually, I receive the same
error after the "target remote" command).
I've also tried to compile the binary from source, but again,
I understand why in OBS the build I found is broken...
> Below I've included a complete change based on the outline above.
Well, thanks a lot for that.
> Below I've included a complete change based on the outline above. It
> seems to do the right thing for a couple of my configurations, but I've
> only eyeballed the resulting code and haven't tried running it. Most of
> my hardware doesn't implement the WAIT instruction anyway.
It means it would be better to have someone else try the resulting
patch, I guess.
Once I have your final thoughts about the questions / observations below,
I will submit the V7 with all the changes.
> Let me know if you find anything here unclear or have any questions or
> comments.
1)
> /* Keep the ISA bit clear for calculations on local labels here. */
The ISA bit should be the bit 0, correct?
So, also in the macro code, it's done to preserve that bit.
2)
.section .cpuidle.text,"ax"
This should be a single patch, right?
Based on what I understood, 'a' should be the allocation, and 'x' the
executable attribute.
This should be in order to mark those symbols like "cpuidle text":
$ nm -n vmlinux | grep -A3 "cpuidle_text"
ffffffff80a127e0 T __cpuidle_text_start
ffffffff80a127e0 T r4k_wait
ffffffff80a12800 t r4k_wait_insn
ffffffff80a12804 t r4k_wait_exit
--
ffffffff80a12c00 T __cpuidle_text_end
ffffffff80a12c00 T __noinstr_text_end
ffffffff80a12c00 t rest_init
ffffffff80a12ccc t kernel_init
I guess it is used in order to understand if the instruction pointer is inside
one of these functions / labels.
How does a commit description sound like this?
"
mips: Add dedicated .cpuidle.text section for CPU idle routines
Add a dedicated executable section for CPU idle code to properly organize
and identify idle-related functions inside the .text section.
"
3)
> I think we also need to replace "rollback" with
> another name as with new code we don't roll back anymore.
Should be changed with "region", instead of rollback, maybe?
Do you have something better to suggest?
Thanks a lot!
On Fri, Mar 28, 2025 at 3:18 PM Maciej W. Rozycki <macro@...am.me.uk> wrote:
>
> On Fri, 28 Mar 2025, Maciej W. Rozycki wrote:
>
> > just massaged a little. FWIW I went ahead and chose to cook this piece up
> > myself since I realised how complex this issue actually is and that it
> > would take us forever to get all the individual aspects nailed over e-mail
> > communication.
>
> And yet after this many of internal iterations I did manage to miss one
> bit. In the optimised version proposed we need to explicitly skip over
> the WAIT instruction like this:
>
> r4k_wait_insn:
> wait
> r4k_wait_exit:
>
> and then:
>
> .set noreorder
> bne k0, k1, \handler
> PTR_ADDIU k0, r4k_wait_exit - r4k_wait_insn + 2
> .set reorder
>
> (and here we have a legitimate use for `.set noreorder' to avoid wasting a
> NOP for the branch delay slot due to a data antidependency on $k0; it's
> fine to clobber $k0 in the branch-taken case as by definition the register
> is dead at the exit from this macro).
>
> Updated patch follows. I think we also need to replace "rollback" with
> another name as with new code we don't roll back anymore.
>
> Maciej
>
> Signed-off-by: Maciej W. Rozycki <macro@...am.me.uk>
> ---
> arch/mips/include/asm/idle.h | 3 --
> arch/mips/kernel/genex.S | 63 +++++++++++++++++++++++++------------------
> arch/mips/kernel/idle.c | 7 ----
> 3 files changed, 38 insertions(+), 35 deletions(-)
>
> linux-mips-idle-vs-timer.diff
> Index: linux-macro/arch/mips/include/asm/idle.h
> ===================================================================
> --- linux-macro.orig/arch/mips/include/asm/idle.h
> +++ linux-macro/arch/mips/include/asm/idle.h
> @@ -6,8 +6,7 @@
> #include <linux/linkage.h>
>
> extern void (*cpu_wait)(void);
> -extern void r4k_wait(void);
> -extern asmlinkage void __r4k_wait(void);
> +extern asmlinkage void r4k_wait(void);
> extern void r4k_wait_irqoff(void);
>
> static inline int using_rollback_handler(void)
> Index: linux-macro/arch/mips/kernel/genex.S
> ===================================================================
> --- linux-macro.orig/arch/mips/kernel/genex.S
> +++ linux-macro/arch/mips/kernel/genex.S
> @@ -104,42 +104,53 @@ NESTED(except_vec3_r4000, 0, sp)
>
> __FINIT
>
> - .align 5 /* 32 byte rollback region */
> -LEAF(__r4k_wait)
> - .set push
> - .set noreorder
> - /* start of rollback region */
> - LONG_L t0, TI_FLAGS($28)
> - nop
> - andi t0, _TIF_NEED_RESCHED
> - bnez t0, 1f
> - nop
> - nop
> - nop
> -#ifdef CONFIG_CPU_MICROMIPS
> - nop
> - nop
> - nop
> - nop
> -#endif
> + .section .cpuidle.text,"ax"
> + /* Align to 32 bytes for the maximum idle interrupt region size. */
> + .align 5
> +LEAF(r4k_wait)
> + /* Keep the ISA bit clear for calculations on local labels here. */
> +0: .fill 0
> + /* Start of idle interrupt region. */
> + local_irq_enable
> + /*
> + * If an interrupt lands here, before going idle on the next
> + * instruction, we must *NOT* go idle since the interrupt could
> + * have set TIF_NEED_RESCHED or caused a timer to need resched.
> + * Fall through -- see rollback_handler below -- and have the
> + * idle loop take care of things.
> + */
> +1: .fill 0
> + /* The R2 EI/EHB sequence takes 8 bytes, otherwise pad up. */
> + .if 1b - 0b > 32
> + .error "overlong idle interrupt region"
> + .elseif 1b - 0b > 8
> + .align 4
> + .endif
> +2: .fill 0
> + .equ r4k_wait_idle_size, 2b - 0b
> + /* End of idle interrupt region; size has to be a power of 2. */
> .set MIPS_ISA_ARCH_LEVEL_RAW
> +r4k_wait_insn:
> wait
> - /* end of rollback region (the region size must be power of two) */
> -1:
> +r4k_wait_exit:
> + .set mips0
> + local_irq_disable
> jr ra
> - nop
> - .set pop
> - END(__r4k_wait)
> + END(r4k_wait)
> + .previous
>
> .macro BUILD_ROLLBACK_PROLOGUE handler
> FEXPORT(rollback_\handler)
> .set push
> .set noat
> MFC0 k0, CP0_EPC
> - PTR_LA k1, __r4k_wait
> - ori k0, 0x1f /* 32 byte rollback region */
> - xori k0, 0x1f
> + /* Subtract/add 2 to let the ISA bit propagate through the mask. */
> + PTR_LA k1, r4k_wait_insn - 2
> + ori k0, r4k_wait_idle_size - 2
> + .set noreorder
> bne k0, k1, \handler
> + PTR_ADDIU k0, r4k_wait_exit - r4k_wait_insn + 2
> + .set reorder
> MTC0 k0, CP0_EPC
> .set pop
> .endm
> Index: linux-macro/arch/mips/kernel/idle.c
> ===================================================================
> --- linux-macro.orig/arch/mips/kernel/idle.c
> +++ linux-macro/arch/mips/kernel/idle.c
> @@ -35,13 +35,6 @@ static void __cpuidle r3081_wait(void)
> write_c0_conf(cfg | R30XX_CONF_HALT);
> }
>
> -void __cpuidle r4k_wait(void)
> -{
> - raw_local_irq_enable();
> - __r4k_wait();
> - raw_local_irq_disable();
> -}
> -
> /*
> * This variant is preferable as it allows testing need_resched and going to
> * sleep depending on the outcome atomically. Unfortunately the "It is
--
Marco Crivellari
L3 Support Engineer, Technology & Product
marco.crivellari@...e.com
Powered by blists - more mailing lists