[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALs-HssaGBHrFdavM=VYm+z0vBWKSnoS2b0cAmjnXA_Kji1KdA@mail.gmail.com>
Date: Tue, 27 Feb 2024 11:44:30 -0800
From: Evan Green <evan@...osinc.com>
To: Charlie Jenkins <charlie@...osinc.com>
Cc: Conor Dooley <conor@...nel.org>, Conor Dooley <conor.dooley@...rochip.com>,
Albert Ou <aou@...s.berkeley.edu>, linux-kernel@...r.kernel.org,
Eric Biggers <ebiggers@...nel.org>, Palmer Dabbelt <palmer@...belt.com>,
Jisheng Zhang <jszhang@...nel.org>, Paul Walmsley <paul.walmsley@...ive.com>,
Clément Léger <cleger@...osinc.com>,
linux-riscv@...ts.infradead.org, Charles Lohr <lohr85@...il.com>
Subject: Re: [PATCH v4 2/2] riscv: Set unalignment speed at compile time
On Tue, Feb 27, 2024 at 11:20 AM Charlie Jenkins <charlie@...osinc.com> wrote:
>
> On Tue, Feb 27, 2024 at 06:48:54PM +0000, Conor Dooley wrote:
> > On Tue, Feb 27, 2024 at 10:17:21AM -0800, Charlie Jenkins wrote:
> > > On Tue, Feb 27, 2024 at 11:39:25AM +0000, Conor Dooley wrote:
> > > > On Fri, Feb 16, 2024 at 12:33:19PM -0800, Charlie Jenkins wrote:
> >
> > > > > +config RISCV_EMULATED_UNALIGNED_ACCESS
> > > > > + bool "Assume the CPU expects emulated unaligned memory accesses"
> > > > > + depends on NONPORTABLE
> > > >
> > > > This is portable too, right?
> > >
> > > I guess so? I think I would prefer to have the probing being the only
> > > portable option.
> >
> > I dunno, I think there could be value to someone in always emulating
> > this in the kernel and I don't think that should relegate them to the
> > naughty step, given it can work everywhere.
>
> Alright, I will remove the nonportable.
>
> >
> >
> > > > > +config RISCV_SLOW_UNALIGNED_ACCESS
> > > > > + bool "Assume the CPU supports slow unaligned memory accesses"
> > > > > + depends on NONPORTABLE
> > > > > + help
> > > > > + Assume that the CPU supports slow unaligned memory accesses. When
> > > > > + enabled, this option improves the performance of the kernel on such
> > > > > + CPUs.
> > > >
> > > > Does it? Are you sure that generating unaligned accesses on systems
> > > > where they are slow is a performance increase?
> > > > That said, I don't really see this option actually doing anything other
> > > > than setting the value for hwprobe, so I don't actually know what the
> > > > effect of this option actually is on the kernel's performance.
> > > >
> > > > Generally I would like to suggest a change from "CPU" to "system" here,
> > > > since the slow cases that exist are mostly because the unaligned access
> > > > is actually emulated in firmware.
> > >
> > > It would be ideal if "emulated" was used for any case of emulated
> > > accesses (firmware or in the kernel). Doing emulated accesses will be
> > > orders of magnitude slower than a processor that "slowly" handles the
> > > accesses.
> > >
> > > So even if the processor performs a "slow" access, it could still be
> > > beneficial for the kernel to do the misaligned access rather than manual
> > > do the alignment.
> >
> > Right. But, at least from a probing perspective, SLOW is what gets
> > selected when firmware emulates the unaligned access so to userspace
> > seeing slow means that the performance could be horrifically bad:
> >
> > | rzfive:
> > | cpu0: Ratio of byte access time to unaligned word access is
> > | 1.05, unaligned accesses are fast
> > |
> > | icicle:
> > |
> > | cpu1: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> > | cpu2: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> > | cpu3: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> > |
> > | cpu0: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> > |
> > | k210:
> > |
> > | cpu1: Ratio of byte access time to unaligned word access is
> > | 0.02, unaligned accesses are slow
> > | cpu0: Ratio of byte access time to unaligned word access is
> > | 0.02, unaligned accesses are slow
> > |
> > | starlight:
> > |
> > | cpu1: Ratio of byte access time to unaligned word access is
> > | 0.01, unaligned accesses are slow
> > | cpu0: Ratio of byte access time to unaligned word access is
> > | 0.02, unaligned accesses are slow
> > |
> > | vexriscv/orangecrab:
> > |
> > | cpu0: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> > https://lore.kernel.org/all/CAMuHMdVtXGjP8VFMiv-7OMFz1XvfU1cz=Fw4jL3fcp4wO1etzQ@mail.gmail.com/
>
> If the accesses are horrifically slow then maybe they should be flagged
> as emulated rather than slow by the probe.
Yeah, I thought about that too. I didn't feel like I had enough info
to come up with the delineating number for "horrifically slow". Plus
Clement came in with a series to detect specifically that accesses are
emulated (though it will only work on future platforms that can
delegate the trap to the kernel).
-Evan
Powered by blists - more mailing lists