linux-kernel - Re: [PATCH] LoongArch: Enhance robust of kprobe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0b5039c3-019b-fd3d-e822-5d2a52c4111d@loongson.cn>
Date: Fri, 11 Apr 2025 10:48:30 +0800
From: Tiezhu Yang <yangtiezhu@...ngson.cn>
To: Jinyang He <hejinyang@...ngson.cn>, Huacai Chen <chenhuacai@...nel.org>
Cc: loongarch@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] LoongArch: Enhance robust of kprobe

On 04/09/2025 10:17 AM, Jinyang He wrote:
> On 2025-04-08 17:27, Tiezhu Yang wrote:
>
>> Currently, interrupts need to be disabled before single-step mode is set,
>> it requires that the CSR_PRMD_PIE must be cleared in save_local_irqflag()
>> which is called by setup_singlestep(), this is reasonable.
>>
>> But in the first kprobe breakpoint exception, if the irq is enabled at
>> the
>> beginning of do_bp(), it will not be disabled at the end of do_bp()
>> due to
>> the CSR_PRMD_PIE has been cleared in save_local_irqflag(). For this case,
>> it may corrupt exception context when restoring exception after
>> do_bp() in
>> handle_bp(), this is not reasonable.
>>
>> Based on the above analysis, in order to make sure the irq is disabled at
>> the end of do_bp() for the first kprobe breakpoint exception, it is
>> proper
>> to disable irq first before clearing CSR_PRMD_PIE in
>> save_local_irqflag().
>>
>> Fixes: 6d4cc40fb5f5 ("LoongArch: Add kprobes support")
>> Co-developed-by: Tianyang Zhang <zhangtianyang@...ngson.cn>
>> Signed-off-by: Tianyang Zhang <zhangtianyang@...ngson.cn>
>> Signed-off-by: Tiezhu Yang <yangtiezhu@...ngson.cn>
>> ---
>>   arch/loongarch/kernel/kprobes.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/loongarch/kernel/kprobes.c
>> b/arch/loongarch/kernel/kprobes.c
>> index 8ba391cfabb0..6eab97636e6b 100644
>> --- a/arch/loongarch/kernel/kprobes.c
>> +++ b/arch/loongarch/kernel/kprobes.c
>> @@ -113,6 +113,7 @@ NOKPROBE_SYMBOL(set_current_kprobe);
>>   static void save_local_irqflag(struct kprobe_ctlblk *kcb,
>>                      struct pt_regs *regs)
>>   {
>> +    local_irq_disable();
>>       kcb->saved_status = regs->csr_prmd;
>>       regs->csr_prmd &= ~CSR_PRMD_PIE;
>>   }
>
> Hi, Tiezhu,
>
> I think the carsh is caused by "irq-triggered re-re-enter" clear
> the previous_kprobe status. An example things like,
>
> ...
>   static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
>                    struct kprobe_ctlblk *kcb, int reenter)
>   {
>       union loongarch_instruction insn;
>
>       if (reenter) {
>           save_previous_kprobe(kcb);
>  ===================   <- irq and trigger re-re-enter in its handler
>           set_current_kprobe(p);
>           kcb->kprobe_status = KPROBE_REENTER;
>       } else {
>           kcb->kprobe_status = KPROBE_HIT_SS;
>       }
> ...
>
> We should assure the previous_kprobe status not be changed after re-enter.
> So this `local_irq_disable` should be set in reenter block begin.
> And for !reenter block, `local_irq_disable` may be not needed.

If you mean to do the following change:

diff --git a/arch/loongarch/kernel/kprobes.c 
b/arch/loongarch/kernel/kprobes.c
index 8ba391cfabb0..1ad67a3c7b70 100644
--- a/arch/loongarch/kernel/kprobes.c
+++ b/arch/loongarch/kernel/kprobes.c
@@ -158,6 +158,7 @@ static void setup_singlestep(struct kprobe *p, 
struct pt_regs *regs,
         union loongarch_instruction insn;

         if (reenter) {
+               local_irq_disable();
                 save_previous_kprobe(kcb);
                 set_current_kprobe(p);
                 kcb->kprobe_status = KPROBE_REENTER;

then it can not fix the case Tianyang and I have met.

With the above change, the issue of kernel hang can
still be reproduced after a few hours.

With the original change of this patch, the issue of
kernel hang can not be reproduced for several days,
it works well.

Maybe what you said is another case, but anyway, the
original change of this patch is safe for any case
because its coverage is more extensive, so just keep
it as is.

Thanks,
Tiezhu