linux-kernel - Re: [PATCH] riscv: kprobe: Optimize kprobe with accurate atomicity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJF2gTTacQXg7e3s9s3dALCnPjstGGMPtk5L5SsN_MWKEov+pQ@mail.gmail.com>
Date:   Fri, 17 Feb 2023 10:28:45 +0800
From:   Guo Ren <guoren@...nel.org>
To:     Björn Töpel <bjorn@...nel.org>
Cc:     Andrea Parri <parri.andrea@...il.com>,
        "liaochang (A)" <liaochang1@...wei.com>, palmer@...belt.com,
        paul.walmsley@...ive.com, mhiramat@...nel.org,
        conor.dooley@...rochip.com, penberg@...nel.org,
        mark.rutland@....com, linux-riscv@...ts.infradead.org,
        linux-kernel@...r.kernel.org, Guo Ren <guoren@...ux.alibaba.com>,
        Changbin Du <changbin.du@...wei.com>
Subject: Re: [PATCH] riscv: kprobe: Optimize kprobe with accurate atomicity

On Thu, Feb 16, 2023 at 3:54 PM Björn Töpel <bjorn@...nel.org> wrote:
>
> Guo Ren <guoren@...nel.org> writes:
>
> > On Tue, Jan 31, 2023 at 6:57 PM Andrea Parri <parri.andrea@...il.com> wrote:
> >>
> >> > > It's the concurrent modification that I was referring to (removing
> >> > > stop_machine()). You're saying "it'll always work", I'm saying "I'm not
> >> > > so sure". :-) E.g., writing c.ebreak on an 32b insn. Can you say that
> >> > Software must ensure write c.ebreak on the head of an 32b insn.
> >> >
> >> > That means IFU only see:
> >> >  - c.ebreak + broken/illegal insn.
> >> > or
> >> >  - origin insn
> >> >
> >> > Even in the worst case, such as IFU fetches instructions one by one:
> >> > If the IFU gets the origin insn, it will skip the broken/illegal insn.
> >> > If the IFU gets the c.ebreak + broken/illegal insn, then an ebreak
> >> > exception is raised.
> >> >
> >> > Because c.ebreak would raise an exception, I don't see any problem.
> >>
> >> That's the problem, this discussion is:
> >>
> >> Reviewer: "I'm not sure, that's not written in our spec"
> >> Submitter: "I said it, it's called -accurate atomicity-"
> > I really don't see any hardware that could break the atomicity of this
> > c.ebreak scenario:
> >  - c.ebreak on the head of 32b insn
> >  - ebreak on an aligned 32b insn
> >
> > If IFU fetches with cacheline, all is atomicity.
> > If IFU fetches with 16bit one by one, the first c.ebreak would raise
> > an exception and skip the next broke/illegal instruction.
> > Even if IFU fetches without any sequence, the IDU must decode one by
> > one, right? The first half c.ebreak would protect and prevent the next
> > broke/illegal instruction. Speculative execution on broke/illegal
> > instruction won't cause any exceptions.
> >
> > It's a common issue, not a specific ISA issue.
> > 32b instruction A -> 16b ebreak + 16b broken/illegal -> 32b
> > instruction A. It's safe to transform.
>
> Waking up this thread again, now that Changbin has showed some interest
> from another thread [1].
>
> Guo, we can't really add your patches, and claim that they're generic,
> "works on all" RISC-V systems. While it might work for your I/D coherent
> system, that does not imply that it'll work on all platforms. RISC-V
> allows for implementations that are I/D incoherent, and here your
> IFU-implementations arguments do not hold. I'd really recommend to
> readup on [2].
Sorry, [2] isn't related to this patch.

This patch didn't have I/D incoherent problem because we broadcast the
IPI fence.i in patch_text_nosync.

Compared to the stop_machine version, there is a crazy nested IPI
broadcast cost.
stop_machine -> patch_text_nosync -> flush_icache_all
void flush_icache_all(void)
{
        local_flush_icache_all();

        if (IS_ENABLED(CONFIG_RISCV_SBI))
                sbi_remote_fence_i(NULL);
        else
                on_each_cpu(ipi_remote_fence_i, NULL, 1);
}
EXPORT_SYMBOL(flush_icache_all);


>
> Now how could we move on with your patches? Get it in a spec, or fold
> the patches in as a Kconfig.socs-thing for the platforms where this is
> OK. What are you thoughts on the latter?

I didn't talk about I/D incoherent/coherent; what I say is the basic
size of the instruction element.
In an I/D cache system, why couldn't LSU store-half guarantee
atomicity for I-cache fetch? How I-cache could fetch only one byte of
that Store-half value?
We've assumed this guarantee in the riscv jump_label implementation,
so why not this patch couldn't?

>
>
> Björn
>
> [1] https://lore.kernel.org/linux-riscv/20230215034532.xs726l7mp6xlnkdf@M910t/
> [2] https://github.com/riscv/riscv-j-extension/blob/master/id-consistency-proposal.pdf



-- 
Best Regards
 Guo Ren