[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56519aed-b23e-fece-3e91-f8db44da6d45@loongson.cn>
Date: Fri, 11 Apr 2025 12:46:40 +0800
From: Jinyang He <hejinyang@...ngson.cn>
To: Tiezhu Yang <yangtiezhu@...ngson.cn>, Huacai Chen <chenhuacai@...nel.org>
Cc: loongarch@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] LoongArch: Enhance robust of kprobe
On 2025-04-11 10:48, Tiezhu Yang wrote:
> On 04/09/2025 10:17 AM, Jinyang He wrote:
>> On 2025-04-08 17:27, Tiezhu Yang wrote:
>>
>>> Currently, interrupts need to be disabled before single-step mode is
>>> set,
>>> it requires that the CSR_PRMD_PIE must be cleared in
>>> save_local_irqflag()
>>> which is called by setup_singlestep(), this is reasonable.
>>>
>>> But in the first kprobe breakpoint exception, if the irq is enabled at
>>> the
>>> beginning of do_bp(), it will not be disabled at the end of do_bp()
>>> due to
>>> the CSR_PRMD_PIE has been cleared in save_local_irqflag(). For this
>>> case,
>>> it may corrupt exception context when restoring exception after
>>> do_bp() in
>>> handle_bp(), this is not reasonable.
>>>
>>> Based on the above analysis, in order to make sure the irq is
>>> disabled at
>>> the end of do_bp() for the first kprobe breakpoint exception, it is
>>> proper
>>> to disable irq first before clearing CSR_PRMD_PIE in
>>> save_local_irqflag().
>>>
>>> Fixes: 6d4cc40fb5f5 ("LoongArch: Add kprobes support")
>>> Co-developed-by: Tianyang Zhang <zhangtianyang@...ngson.cn>
>>> Signed-off-by: Tianyang Zhang <zhangtianyang@...ngson.cn>
>>> Signed-off-by: Tiezhu Yang <yangtiezhu@...ngson.cn>
>>> ---
>>> arch/loongarch/kernel/kprobes.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/arch/loongarch/kernel/kprobes.c
>>> b/arch/loongarch/kernel/kprobes.c
>>> index 8ba391cfabb0..6eab97636e6b 100644
>>> --- a/arch/loongarch/kernel/kprobes.c
>>> +++ b/arch/loongarch/kernel/kprobes.c
>>> @@ -113,6 +113,7 @@ NOKPROBE_SYMBOL(set_current_kprobe);
>>> static void save_local_irqflag(struct kprobe_ctlblk *kcb,
>>> struct pt_regs *regs)
>>> {
>>> + local_irq_disable();
>>> kcb->saved_status = regs->csr_prmd;
>>> regs->csr_prmd &= ~CSR_PRMD_PIE;
>>> }
>>
>> Hi, Tiezhu,
>>
>> I think the carsh is caused by "irq-triggered re-re-enter" clear
>> the previous_kprobe status. An example things like,
>>
>> ...
>> static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
>> struct kprobe_ctlblk *kcb, int reenter)
>> {
>> union loongarch_instruction insn;
>>
>> if (reenter) {
>> save_previous_kprobe(kcb);
>> =================== <- irq and trigger re-re-enter in its handler
>> set_current_kprobe(p);
>> kcb->kprobe_status = KPROBE_REENTER;
>> } else {
>> kcb->kprobe_status = KPROBE_HIT_SS;
>> }
>> ...
>>
>> We should assure the previous_kprobe status not be changed after
>> re-enter.
>> So this `local_irq_disable` should be set in reenter block begin.
>> And for !reenter block, `local_irq_disable` may be not needed.
>
> If you mean to do the following change:
>
> diff --git a/arch/loongarch/kernel/kprobes.c
> b/arch/loongarch/kernel/kprobes.c
> index 8ba391cfabb0..1ad67a3c7b70 100644
> --- a/arch/loongarch/kernel/kprobes.c
> +++ b/arch/loongarch/kernel/kprobes.c
> @@ -158,6 +158,7 @@ static void setup_singlestep(struct kprobe *p,
> struct pt_regs *regs,
> union loongarch_instruction insn;
>
> if (reenter) {
> + local_irq_disable();
> save_previous_kprobe(kcb);
> set_current_kprobe(p);
> kcb->kprobe_status = KPROBE_REENTER;
>
> then it can not fix the case Tianyang and I have met.
>
> With the above change, the issue of kernel hang can
> still be reproduced after a few hours.
>
> With the original change of this patch, the issue of
> kernel hang can not be reproduced for several days,
> it works well.
>
> Maybe what you said is another case, but anyway, the
> original change of this patch is safe for any case
> because its coverage is more extensive, so just keep
> it as is.
>
static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
struct kprobe_ctlblk *kcb, int reenter)
{
union loongarch_instruction insn;
if (reenter) {
-> My meaning, local_irq_disable();
LabelA:
save_previous_kprobe(kcb);
set_current_kprobe(p);
kcb->kprobe_status = KPROBE_REENTER;
LabelC:
} else {
LabelD:
kcb->kprobe_status = KPROBE_HIT_SS;
}
if (p->ainsn.insn) {
-> Yours, local_irq_disable();
LabelB:
/* IRQs and single stepping do not mix well */
save_local_irqflag(kcb, regs);
/* set ip register to prepare for single stepping */
regs->csr_era = (unsigned long)p->ainsn.insn;
} else {
/* simulate single steping */
insn.word = p->opcode;
arch_simulate_insn(insn, regs);
/* now go for post processing */
post_kprobe_handler(p, kcb, regs);
}
}
I have just explained the previous status may be broken by IRQ.
That's why we should protected LabelA->LabelC.
For the case `p->ainsn.insn`, my meaning is just not protecte LabelD.
If that is wrong, the change should be changed to
static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
struct kprobe_ctlblk *kcb, int reenter)
{
union loongarch_instruction insn;
+ if (reenter || p->ainsn.insn) local_irq_disable();
... other codes.
}
On the other hand, have you tried only fix do_bp weather cause hang?
bool this_bp_ie = regs->csr_prmd & CSR_PRMD_PIE;
if (this_bp_ie)
local_irq_enable();
...
if (this_bp_ie)
local_irq_disable();
Powered by blists - more mailing lists