[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <05bdf6c3-9d60-4b1c-add3-79b4f7d25189@rivosinc.com>
Date: Tue, 18 Mar 2025 15:22:16 +0100
From: Clément Léger <cleger@...osinc.com>
To: Alexandre Ghiti <alex@...ti.fr>, Andrew Jones <ajones@...tanamicro.com>
Cc: linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, paul.walmsley@...ive.com, palmer@...belt.com,
charlie@...osinc.com, Anup Patel <apatel@...tanamicro.com>, corbet@....net
Subject: Re: [PATCH v3 7/8] riscv: Add parameter for skipping access speed
tests
On 18/03/2025 15:09, Alexandre Ghiti wrote:
> On 18/03/2025 14:04, Andrew Jones wrote:
>> On Tue, Mar 18, 2025 at 01:58:10PM +0100, Alexandre Ghiti wrote:
>>> On 18/03/2025 13:45, Andrew Jones wrote:
>>>> On Tue, Mar 18, 2025 at 01:13:18PM +0100, Alexandre Ghiti wrote:
>>>>> On 18/03/2025 09:48, Andrew Jones wrote:
>>>>>> On Mon, Mar 17, 2025 at 03:39:01PM +0100, Alexandre Ghiti wrote:
>>>>>>> Hi Drew,
>>>>>>>
>>>>>>> On 04/03/2025 13:00, Andrew Jones wrote:
>>>>>>>> Allow skipping scalar and vector unaligned access speed tests. This
>>>>>>>> is useful for testing alternative code paths and to skip the
>>>>>>>> tests in
>>>>>>>> environments where they run too slowly. All CPUs must have the same
>>>>>>>> unaligned access speed.
>>>>>>> I'm not a big fan of the command line parameter, this is not
>>>>>>> where we should
>>>>>>> push uarch decisions because there could be many other in the
>>>>>>> future, the
>>>>>>> best solution to me should be in DT/ACPI and since the DT folks,
>>>>>>> according
>>>>>>> to Palmer, shut down this solution, it remains using an extension.
>>>>>>>
>>>>>>> I have been reading a bit about unaligned accesses. Zicclsm was
>>>>>>> described as
>>>>>>> "Even though mandated, misaligned loads and stores might execute
>>>>>>> extremely
>>>>>>> slowly. Standard software distributions should assume their
>>>>>>> existence only
>>>>>>> for correctness, not for performance." in rva20/22 but *not* in
>>>>>>> rva23. So
>>>>>>> what about using this "hole" and consider that a platform that
>>>>>>> *advertises*
>>>>>>> Zicclsm means its unaligned accesses are fast? After internal
>>>>>>> discussion, It
>>>>>>> actually does not make sense to advertise Zicclsm if the platform
>>>>>>> accesses
>>>>>>> are slow right?
>>>>>> This topic pops up every so often, including in yesterday's server
>>>>>> platform TG call. In that call, and, afaict, every other time it has
>>>>>> popped up, the result is to reiterate that ISA extensions never say
>>>>>> anything about performance. So, Zicclsm will never mean fast and we
>>>>>> won't likely be able to add any extension that does.
>>>>> Ok, I should not say "fast". Usually, when an extension is
>>>>> advertised by a
>>>>> platform, we don't question its speed (zicboz, zicbom...etc), we
>>>>> simply use
>>>>> it and it's up to the vendor to benchmark its implementation and act
>>>>> accordingly (i.e. do not set it in the isa string).
>>>>>
>>>>>
>>>>>>> arm64 for example considers that armv8 has fast unaligned
>>>>>>> accesses and can
>>>>>>> then enable HAVE_EFFICIENT_ALIGNED_ACCESS in the kernel, even
>>>>>>> though some
>>>>>>> uarchs are slow. Distros will very likely use rva23 as baseline
>>>>>>> so they will
>>>>>>> enable Zicclsm which would allow us to take advantage of this
>>>>>>> too, without
>>>>>>> this, we lose a lot of perf improvement in the kernel, see
>>>>>>> https://lore.kernel.org/lkml/20231225044207.3821-1-
>>>>>>> jszhang@...nel.org/.
>>>>>>>
>>>>>>> Or we could have a new named feature for this, even though it's
>>>>>>> weird to
>>>>>>> have a named feature which would basically mean "Zicclsm is
>>>>>>> fast". We don't
>>>>>>> have, for example, a named feature to say "Zicboz is fast" but
>>>>>>> given the
>>>>>>> vague wording in the profile spec, maybe we can ask for one in
>>>>>>> that case?
>>>>>>>
>>>>>>> Sorry for the late review and for triggering this debate...
>>>>>> No problem, let's try to pick the best option. I'll try listing
>>>>>> all the
>>>>>> options and there pros/cons.
>>>>>>
>>>>>> 1. Leave as is, which is to always probe
>>>>>> pro: Nothing to do
>>>>>> con: Not ideal in all environments
>>>>>>
>>>>>> 2. New DT/ACPI description
>>>>>> pro: Describing whether or not misaligned accesses are
>>>>>> implemented in
>>>>>> HW (which presumably means fast) is something that
>>>>>> should be done
>>>>>> in HW descriptions
>>>>>> con: We'll need to live with probing until we can get the
>>>>>> descriptions
>>>>>> defined, which may be never if there's too much opposition
>>>>>>
>>>>>> 3. Command line
>>>>>> pro: Easy and serves its purpose, which is to skip probing
>>>>>> in the
>>>>>> environments where probing is not desired
>>>>>> con: Yet another command line option (which we may want to
>>>>>> deprecate
>>>>>> someday)
>>>>>>
>>>>>> 4. New ISA extension
>>>>>> pro: Easy to add to HW descriptions
>>>>>> con: Not likely to get it through ratification
>>>>>>
>>>>>> 5. New SBI FWFT feature
>>>>>> pro: Probably easier to get through ratification than an ISA
>>>>>> extension
>>>>>> con: Instead of probing, kernel would have to ask SBI --
>>>>>> would that
>>>>>> even be faster? Will all the environments that want to
>>>>>> skip
>>>>>> probing even have a complete SBI?
>>>>>>
>>>>>> 6. ??
>>>>> So what about:
>>>>>
>>>>> 7. New enum value describing the performance as "FORCED" or "HW" (or
>>>>> anything better)
>>>>> pro: We only use the existing Zicclsm
>>>>> con: It's not clear that the accesses are fast but it
>>>>> basically says to
>>>>> SW "don't think too much, I'm telling you that you can use it", up
>>>>> to us to
>>>>> describe this correctly for users to understand.
>>>> But Zicclsm doesn't mean misaligned accesses are in HW, it just means
>>>> they're not going to explode.
>>>
>>> They never explode since if they are not supported by the HW, we rely on
>>> S-mode emulation already.
>> Exactly. Zicclsm is just a new name for that behavior. Profiles try to
>> name every behavior, even the ones we take for granted. Unfortunately,
>> like in the case of Zicclsm, we don't necessarily gain anything from
>> the new name. In this case, we don't gain a way to avoid probing.
>
>
> I understand your point but given the misaligned traps exist, I can't
> find another meaning to Zicclsm than "I'm telling you to use it".
> Zicclsm can't be used to describe an OS behaviour (ie the emulation of
> misaligned accesses).
Hi Alex,
Some SBI implementation might decide not to delegate the misaligned trap
and not emulate it or partially emulate it. IMHO, Zicclsm should
actually be advertised by SBI to actually tell the OS that misaligned
accesses are supported (even though they are slow) since Zicclsm is a
profile extension (at least in its first definition). I think we can not
rely on Zicclsm to determine that accesses are fast. Moreover, it seems
like its definition evolved over time and lacks clarity to be reliable.
>
> I'm also insisting because we need a compile-time hint which allows us
> to enable HAVE_EFFICIENT_UNALIGNED_ACCESS in the kernel and Zicclsm is
> great since it is required in RVA23. if that's not Zicclsm, that must be
> another named feature/extension.
As said in the other thread, I think we might have to enable
HAVE_EFFICIENT_UNALIGNED_ACCESS as a default (or whatever option selects
that CONFIG). HW without misaligned access support implementation would
suffer from that choice but would still work (although poorly) thanks to
S-mode emulation. If one wants to run the kernel more efficiently on
some smaller chip without any hardware support for it, then it should
disable that config. I think that we can not accommodate both world
without hurting one side or the other, a choice needs to be made.
Thanks,
Clément
>
> What do you suggest to make progress here?
>
> Thanks,
>
> Alex
>
>
>>
>> Thanks,
>> drew
>>
>>>
>>>> We'd still need the probing to find out
>>>> if the accesses are emulated (slow) or hw (fast). We at least want to
>>>> know the answer to that question because we advertise it to userspace
>>>> through hwprobe.
>>>>
>>>> (BTW, another pro of the command line is that it can be used to test
>>>> both slow and fast paths without recompiling.)
>>>>
>>>> Thanks,
>>>> drew
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@...ts.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
Powered by blists - more mailing lists