linux-kernel - Re: [PATCH v4 2/2] riscv: Set unalignment speed at compile time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALs-HssaGBHrFdavM=VYm+z0vBWKSnoS2b0cAmjnXA_Kji1KdA@mail.gmail.com>
Date: Tue, 27 Feb 2024 11:44:30 -0800
From: Evan Green <evan@...osinc.com>
To: Charlie Jenkins <charlie@...osinc.com>
Cc: Conor Dooley <conor@...nel.org>, Conor Dooley <conor.dooley@...rochip.com>, 
	Albert Ou <aou@...s.berkeley.edu>, linux-kernel@...r.kernel.org, 
	Eric Biggers <ebiggers@...nel.org>, Palmer Dabbelt <palmer@...belt.com>, 
	Jisheng Zhang <jszhang@...nel.org>, Paul Walmsley <paul.walmsley@...ive.com>, 
	Clément Léger <cleger@...osinc.com>, 
	linux-riscv@...ts.infradead.org, Charles Lohr <lohr85@...il.com>
Subject: Re: [PATCH v4 2/2] riscv: Set unalignment speed at compile time

On Tue, Feb 27, 2024 at 11:20 AM Charlie Jenkins <charlie@...osinc.com> wrote:
>
> On Tue, Feb 27, 2024 at 06:48:54PM +0000, Conor Dooley wrote:
> > On Tue, Feb 27, 2024 at 10:17:21AM -0800, Charlie Jenkins wrote:
> > > On Tue, Feb 27, 2024 at 11:39:25AM +0000, Conor Dooley wrote:
> > > > On Fri, Feb 16, 2024 at 12:33:19PM -0800, Charlie Jenkins wrote:
> >
> > > > > +config RISCV_EMULATED_UNALIGNED_ACCESS
> > > > > +       bool "Assume the CPU expects emulated unaligned memory accesses"
> > > > > +       depends on NONPORTABLE
> > > >
> > > > This is portable too, right?
> > >
> > > I guess so? I think I would prefer to have the probing being the only
> > > portable option.
> >
> > I dunno, I think there could be value to someone in always emulating
> > this in the kernel and I don't think that should relegate them to the
> > naughty step, given it can work everywhere.
>
> Alright, I will remove the nonportable.
>
> >
> >
> > > > > +config RISCV_SLOW_UNALIGNED_ACCESS
> > > > > +       bool "Assume the CPU supports slow unaligned memory accesses"
> > > > > +       depends on NONPORTABLE
> > > > > +       help
> > > > > +         Assume that the CPU supports slow unaligned memory accesses. When
> > > > > +         enabled, this option improves the performance of the kernel on such
> > > > > +         CPUs.
> > > >
> > > > Does it? Are you sure that generating unaligned accesses on systems
> > > > where they are slow is a performance increase?
> > > > That said, I don't really see this option actually doing anything other
> > > > than setting the value for hwprobe, so I don't actually know what the
> > > > effect of this option actually is on the kernel's performance.
> > > >
> > > > Generally I would like to suggest a change from "CPU" to "system" here,
> > > > since the slow cases that exist are mostly because the unaligned access
> > > > is actually emulated in firmware.
> > >
> > > It would be ideal if "emulated" was used for any case of emulated
> > > accesses (firmware or in the kernel).  Doing emulated accesses will be
> > > orders of magnitude slower than a processor that "slowly" handles the
> > > accesses.
> > >
> > > So even if the processor performs a "slow" access, it could still be
> > > beneficial for the kernel to do the misaligned access rather than manual
> > > do the alignment.
> >
> > Right. But, at least from a probing perspective, SLOW is what gets
> > selected when firmware emulates the unaligned access so to userspace
> > seeing slow means that the performance could be horrifically bad:
> >
> > |     rzfive:
> > |         cpu0: Ratio of byte access time to unaligned word access is
> > | 1.05, unaligned accesses are fast
> > |
> > |     icicle:
> > |
> > |         cpu1: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> > |         cpu2: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> > |         cpu3: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> > |
> > |         cpu0: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> > |
> > |     k210:
> > |
> > |         cpu1: Ratio of byte access time to unaligned word access is
> > | 0.02, unaligned accesses are slow
> > |         cpu0: Ratio of byte access time to unaligned word access is
> > | 0.02, unaligned accesses are slow
> > |
> > |     starlight:
> > |
> > |         cpu1: Ratio of byte access time to unaligned word access is
> > | 0.01, unaligned accesses are slow
> > |         cpu0: Ratio of byte access time to unaligned word access is
> > | 0.02, unaligned accesses are slow
> > |
> > |     vexriscv/orangecrab:
> > |
> > |         cpu0: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> >  https://lore.kernel.org/all/CAMuHMdVtXGjP8VFMiv-7OMFz1XvfU1cz=Fw4jL3fcp4wO1etzQ@mail.gmail.com/
>
> If the accesses are horrifically slow then maybe they should be flagged
> as emulated rather than slow by the probe.

Yeah, I thought about that too. I didn't feel like I had enough info
to come up with the delineating number for "horrifically slow". Plus
Clement came in with a series to detect specifically that accesses are
emulated (though it will only work on future platforms that can
delegate the trap to the kernel).

-Evan