[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f0192bb6-2a5c-d51e-ef11-190fd61af0b9@loongson.cn>
Date: Tue, 3 Jan 2023 16:42:39 +0800
From: Jinyang He <hejinyang@...ngson.cn>
To: Huacai Chen <chenhuacai@...nel.org>
Cc: Qi Hu <huqi@...ngson.cn>, WANG Xuerui <kernel@...0n.name>,
Tiezhu Yang <yangtiezhu@...ngson.cn>,
loongarch@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] LoongArch: Fix irq enable in exception handlers
On 2023-01-03 12:54, Huacai Chen wrote:
> On Fri, Dec 30, 2022 at 1:58 PM Jinyang He <hejinyang@...ngson.cn> wrote:
>>
>> On 2022-12-29 14:54, Qi Hu wrote:
>>> On 2022/12/29 14:13, Jinyang He wrote:
>>>> On 2022-12-29 00:51, Qi Hu wrote:
>>>>
>>>>> On 2022/12/27 18:10, Jinyang He wrote:
>>>>>> On 2022-12-27 17:52, Huacai Chen wrote:
>>>>>>
>>>>>>> On Tue, Dec 27, 2022 at 5:30 PM Jinyang He <hejinyang@...ngson.cn>
>>>>>>> wrote:
>>>>>>>> On 2022-12-27 15:37, Huacai Chen wrote:
>>>>>>>>> Hi, Jinyang,
>>>>>>>>>
>>>>>>>>> Move die_if_kernel to irq disabled context to solve what?
>>>>>>>> For more strict logical. If the code flow go to die in
>>>>>>>> die_if_kernel(),
>>>>>>>> its interrupt state is enable, that means it may cause schedule.
>>>>>>>> So I think it is better to call die_if_kernel() firstly.
>>>>>>> die_if_kernel is called with irq enabled in old kernels for several
>>>>>>> years, and has no problems.
>>>>>>
>>>>>> I think because it never call die() in die_if_kernel(). What I do
>>>>>> emphasize is that there needs to be more strict logic here than
>>>>>> it worked well in the past. I bet if die_if_kernel() was removed,
>>>>>> it will still work well in the future.
>>>>>>
>>>>>>
>>>>>>>>> And LBT is
>>>>>>>>> surely allowed to be triggered in kernel context.
>>>>>>>> I'm not familar with lbt, I just not see any lbt codes in kernel.
>>>>>>>> Plz,
>>>>>>>> how lbt exception triggered, and how kernel trigger lbt exception?
>>>>>>> You can ask Huqi for more details, and this was discussed publicly
>>>>>>> last week.
>>>>>> To: Qi Hu
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> We really need some help. Can you give us some ideas?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jinyang
>>>>>>
>>>>> Huacai is correct. The LBT disable exception (BTD) can be triggered
>>>>> in kernel context.
>>>>>
>>>>> If the CSR.ENEU.BTE == 0 [^1], the LBT instructions (these [^2] will
>>>>> be used in the kernel) will trigger the exception.
>>>>>
>>>>> Unfortunately, when you want to do some fpu_{save, restore}, you
>>>>> need to use some LBT instructions [^3] [^4]. So if FPD is triggered,
>>>>> LBT might still not be enabled, and the 'do_lbt' will be called in
>>>>> the kernel context.
>>>>>
>>>>> Hope the information can help. Thanks.
>>>>>
>>>>>
>>>>> [1]
>>>>> https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#extended-component-unit-enable
>>>>>
>>>>> [2]
>>>>> https://github.com/loongson/linux/pull/4/files#diff-381d03cf86e2796d280e2fc82c005409d5e44b4bbbf90dd0dc17f5f0fa5553f1R140-R184
>>>>>
>>>>> [3]
>>>>> https://github.com/loongson/linux/pull/4/files#diff-381d03cf86e2796d280e2fc82c005409d5e44b4bbbf90dd0dc17f5f0fa5553f1R218-R230
>>>>>
>>>>> [4]
>>>>> https://github.com/loongson/linux/pull/4/files#diff-381d03cf86e2796d280e2fc82c005409d5e44b4bbbf90dd0dc17f5f0fa5553f1R236-R263
>>>>>
>>>>>
>>>> Hi,
>>>>
>>>>
>>>> That's helpful. Thanks!
>>>>
>>>>
>>>> But I still wonder if SXD or ASXD have the same possibility of being
>>>> triggered in the kernel mode by sc_save_{lsx, lasx} or other place.
>>>> Do we need remove these die_if_kernel codes in do_lasx() and do_lsx()?
>>>>
>>>>
>>>> Jinyang
>>>>
>>> Hi Jinyang,
>>>
>>> I think only BTD has this tricky situation, because there is some
>>> overlap between FPD/SXD/ASXD and BTD.
>>>
>>> So, in my view, SXD or ASXD will not be triggered in kernel mode.
>>>
>>> Thanks.
>>>
>>>
>>> Qi
>> Got it. Thanks for your help. And I'll fix my patch in next version.
> In my opinion only the do_bp() modification is useful, and that part
> can be squashed to Tiezhu's kprobe patches.
Yes, I have to admit that only the modification of do_bp() is useful,
in fact other modification are not triggered, I think. Most do_xxx is
irq enabled or in user mode before triggered. Although I can give a test
in Qemu that make do_ri() triggered in irq disable state, and then it
will hang if unconditionally call local_irq_enable, I know it makes no sense
because these codes can not be triggered currently, just like this BUG
cannot be found only before Tiezhu supporting kprobe on LoongArch.
If leaving potentially illogical codes is allowed, squash it to Tiezhu's
kprobe patches.
Jinyang
>
> Huacai
>>
>> Jinyang
>>
>>
>>>>> Qi
>>>>>
>>>>>>> Huacai
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Jinyang
>>>>>>>>
>>>>>>>>
>>>>>>>>> Huacai
>>>>>>>>>
>>>>>>>>> On Wed, Dec 21, 2022 at 3:43 PM Jinyang He
>>>>>>>>> <hejinyang@...ngson.cn> wrote:
>>>>>>>>>> The interrupt state can be got by regs->csr_prmd. Once previous
>>>>>>>>>> interrupt state is disable, we shouldn't enable interrupt if we
>>>>>>>>>> triggered exception which can be triggered in kernel mode. So
>>>>>>>>>> conditionally enable interrupt. For those do_\exception which
>>>>>>>>>> can not triggered in kernel mode but need enable interrupt, call
>>>>>>>>>> die_if_kernel() firstly. And for do_lsx, do_lasx and do_lbt cannot
>>>>>>>>>> triggered in kernel mode, too.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Jinyang He <hejinyang@...ngson.cn>
>>>>>>>>>> ---
>>>>>>>>>> arch/loongarch/kernel/traps.c | 19 ++++++++++---------
>>>>>>>>>> 1 file changed, 10 insertions(+), 9 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/loongarch/kernel/traps.c
>>>>>>>>>> b/arch/loongarch/kernel/traps.c
>>>>>>>>>> index 1ea14f6c18d3..3ac7b32d1e15 100644
>>>>>>>>>> --- a/arch/loongarch/kernel/traps.c
>>>>>>>>>> +++ b/arch/loongarch/kernel/traps.c
>>>>>>>>>> @@ -340,9 +340,9 @@ asmlinkage void noinstr do_fpe(struct
>>>>>>>>>> pt_regs *regs, unsigned long fcsr)
>>>>>>>>>>
>>>>>>>>>> /* Clear FCSR.Cause before enabling interrupts */
>>>>>>>>>> write_fcsr(LOONGARCH_FCSR0, fcsr & ~mask_fcsr_x(fcsr));
>>>>>>>>>> - local_irq_enable();
>>>>>>>>>>
>>>>>>>>>> die_if_kernel("FP exception in kernel code", regs);
>>>>>>>>>> + local_irq_enable();
>>>>>>>>>>
>>>>>>>>>> sig = SIGFPE;
>>>>>>>>>> fault_addr = (void __user *) regs->csr_era;
>>>>>>>>>> @@ -432,7 +432,8 @@ asmlinkage void noinstr do_bp(struct
>>>>>>>>>> pt_regs *regs)
>>>>>>>>>> unsigned long era = exception_era(regs);
>>>>>>>>>> irqentry_state_t state = irqentry_enter(regs);
>>>>>>>>>>
>>>>>>>>>> - local_irq_enable();
>>>>>>>>>> + if (regs->csr_prmd & CSR_PRMD_PIE)
>>>>>>>>>> + local_irq_enable();
>>>>>>>>>> current->thread.trap_nr = read_csr_excode();
>>>>>>>>>> if (__get_inst(&opcode, (u32 *)era, user))
>>>>>>>>>> goto out_sigsegv;
>>>>>>>>>> @@ -514,7 +515,8 @@ asmlinkage void noinstr do_ri(struct
>>>>>>>>>> pt_regs *regs)
>>>>>>>>>> unsigned int __user *era = (unsigned int __user
>>>>>>>>>> *)exception_era(regs);
>>>>>>>>>> irqentry_state_t state = irqentry_enter(regs);
>>>>>>>>>>
>>>>>>>>>> - local_irq_enable();
>>>>>>>>>> + if (regs->csr_prmd & CSR_PRMD_PIE)
>>>>>>>>>> + local_irq_enable();
>>>>>>>>>> current->thread.trap_nr = read_csr_excode();
>>>>>>>>>>
>>>>>>>>>> if (notify_die(DIE_RI, "RI Fault", regs, 0,
>>>>>>>>>> current->thread.trap_nr,
>>>>>>>>>> @@ -606,8 +608,8 @@ asmlinkage void noinstr do_fpu(struct
>>>>>>>>>> pt_regs *regs)
>>>>>>>>>> {
>>>>>>>>>> irqentry_state_t state = irqentry_enter(regs);
>>>>>>>>>>
>>>>>>>>>> - local_irq_enable();
>>>>>>>>>> die_if_kernel("do_fpu invoked from kernel context!",
>>>>>>>>>> regs);
>>>>>>>>>> + local_irq_enable();
>>>>>>>>>> BUG_ON(is_lsx_enabled());
>>>>>>>>>> BUG_ON(is_lasx_enabled());
>>>>>>>>>>
>>>>>>>>>> @@ -623,13 +625,13 @@ asmlinkage void noinstr do_lsx(struct
>>>>>>>>>> pt_regs *regs)
>>>>>>>>>> {
>>>>>>>>>> irqentry_state_t state = irqentry_enter(regs);
>>>>>>>>>>
>>>>>>>>>> + die_if_kernel("do_lsx invoked from kernel context!",
>>>>>>>>>> regs);
>>>>>>>>>> local_irq_enable();
>>>>>>>>>> if (!cpu_has_lsx) {
>>>>>>>>>> force_sig(SIGILL);
>>>>>>>>>> goto out;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> - die_if_kernel("do_lsx invoked from kernel context!",
>>>>>>>>>> regs);
>>>>>>>>>> BUG_ON(is_lasx_enabled());
>>>>>>>>>>
>>>>>>>>>> preempt_disable();
>>>>>>>>>> @@ -645,14 +647,13 @@ asmlinkage void noinstr do_lasx(struct
>>>>>>>>>> pt_regs *regs)
>>>>>>>>>> {
>>>>>>>>>> irqentry_state_t state = irqentry_enter(regs);
>>>>>>>>>>
>>>>>>>>>> + die_if_kernel("do_lasx invoked from kernel context!",
>>>>>>>>>> regs);
>>>>>>>>>> local_irq_enable();
>>>>>>>>>> if (!cpu_has_lasx) {
>>>>>>>>>> force_sig(SIGILL);
>>>>>>>>>> goto out;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> - die_if_kernel("do_lasx invoked from kernel context!",
>>>>>>>>>> regs);
>>>>>>>>>> -
>>>>>>>>>> preempt_disable();
>>>>>>>>>> init_restore_lasx();
>>>>>>>>>> preempt_enable();
>>>>>>>>>> @@ -666,6 +667,7 @@ asmlinkage void noinstr do_lbt(struct
>>>>>>>>>> pt_regs *regs)
>>>>>>>>>> {
>>>>>>>>>> irqentry_state_t state = irqentry_enter(regs);
>>>>>>>>>>
>>>>>>>>>> + die_if_kernel("do_lbt invoked from kernel context!",
>>>>>>>>>> regs);
>>>>>>>>>> local_irq_enable();
>>>>>>>>>> force_sig(SIGILL);
>>>>>>>>>> local_irq_disable();
>>>>>>>>>> @@ -677,7 +679,6 @@ asmlinkage void noinstr do_reserved(struct
>>>>>>>>>> pt_regs *regs)
>>>>>>>>>> {
>>>>>>>>>> irqentry_state_t state = irqentry_enter(regs);
>>>>>>>>>>
>>>>>>>>>> - local_irq_enable();
>>>>>>>>>> /*
>>>>>>>>>> * Game over - no way to handle this if it ever
>>>>>>>>>> occurs. Most probably
>>>>>>>>>> * caused by a fatal error after another
>>>>>>>>>> hardware/software error.
>>>>>>>>>> @@ -685,8 +686,8 @@ asmlinkage void noinstr do_reserved(struct
>>>>>>>>>> pt_regs *regs)
>>>>>>>>>> pr_err("Caught reserved exception %u on pid:%d [%s] -
>>>>>>>>>> should not happen\n",
>>>>>>>>>> read_csr_excode(), current->pid, current->comm);
>>>>>>>>>> die_if_kernel("do_reserved exception", regs);
>>>>>>>>>> + local_irq_enable();
>>>>>>>>>> force_sig(SIGUNUSED);
>>>>>>>>>> -
>>>>>>>>>> local_irq_disable();
>>>>>>>>>>
>>>>>>>>>> irqentry_exit(regs, state);
>>>>>>>>>> --
>>>>>>>>>> 2.34.3
>>>>>>>>>>
Powered by blists - more mailing lists