[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <mhng-2db9f262-50f6-4ed0-91a8-1268796bb331@palmer-ri-x1c9>
Date: Tue, 11 Feb 2025 10:09:57 -0800 (PST)
From: Palmer Dabbelt <palmer@...belt.com>
To: apatel@...tanamicro.com, Conor Dooley <conor@...nel.org>
CC: cleger@...osinc.com, Charlie Jenkins <charlie@...osinc.com>,
ajones@...tanamicro.com, anup@...infault.org, linux-riscv@...ts.infradead.org,
linux-kernel@...r.kernel.org, Paul Walmsley <paul.walmsley@...ive.com>
Subject: Re: [PATCH 7/9] riscv: Prepare for unaligned access type table lookups
On Mon, 10 Feb 2025 20:26:32 PST (-0800), apatel@...tanamicro.com wrote:
> On Tue, Feb 11, 2025 at 2:27 AM Clément Léger <cleger@...osinc.com> wrote:
>>
>>
>>
>> On 10/02/2025 21:53, Charlie Jenkins wrote:
>> > On Mon, Feb 10, 2025 at 09:42:25PM +0100, Clément Léger wrote:
>> >>
>> >>
>> >> On 10/02/2025 18:20, Charlie Jenkins wrote:
>> >>> On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote:
>> >>>>
>> >>>>
>> >>>> On 10/02/2025 15:06, Andrew Jones wrote:
>> >>>>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>> On 10/02/2025 11:16, Anup Patel wrote:
>> >>>>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@...osinc.com> wrote:
>> >>>>>>>>
>> >>>>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
>> >>>>>>>>> Probing unaligned accesses on boot is time consuming. Provide a
>> >>>>>>>>> function which will be used to look up the access type in a table
>> >>>>>>>>> by id registers. Vendors which provide table entries can then skip
>> >>>>>>>>> the probing.
>> >>>>>>>>
>> >>>>>>>> The access checker in my experience is only time consuming on slow
>> >>>>>>>> hardware. Hardware that supports fast unaligned accesses isn't really
>> >>>>>>>> impacted by this? Avoiding a list of hardware that has slow/fast
>> >>>>>>>> unaligned accesses in the kernel was the main reason for dynamically
>> >>>>>>>> checking. We did introduce the config option to compile the kernel with
>> >>>>>>>> assumed slow/fast accesses, which of course has the downside of
>> >>>>>>>> recompiling the kernel and I assume that you already considered that.
>> >>>>>>>
>> >>>>>>> The kconfig option does not align with the vision of running the same
>> >>>>>>> kernel image across platforms.
>> >>>>>>
>> >>>>>> I'd would be advocating to remove compile time options as well and use
>> >>>>>> another way to skip the probe (see below).
>> >>>>>>
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>> Instead of having a table in the kernel, something that would be more
>> >>>>>>>> platform agnostic would be to have an extension that signals this
>> >>>>>>>> information. That seems like it would accomplish the same goal and
>> >>>>>>>> leverage the existing infrastructure in the kernel, albeit with the need
>> >>>>>>>> to make a new extension.
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> IMO, expecting an ISA extension to be defined for all possible
>> >>>>>>> microarchitectural choices is not going to scale so it is better
>> >>>>>>> to have infrastructure in kernel itself to infer microarchitectural
>> >>>>>>> choices based on RISC-V implementation ID.
>> >>>>>>
>> >>>>>> Since adding an extension seems quite unlikely, and that a device-tree
>> >>>>>> property is likely DT centric and not applicable to ACPI as well, was a
>> >>>>>> command line argument considered ?
>> >>>>>>
>> >>>>>
>> >>>>> I did consider adding a command line option in addition to the table,
>> >>>>> allowing platforms which neither have a table entry [yet] nor want to do
>> >>>>> the speed test, to set whatever they like. In the end, I dropped it, since
>> >>>>> I don't have a use case at this time. However, if we really don't want a
>> >>>>> table, then I can look into the command line option instead.
>> >>>>
>> >>>> Sorry if I wasn't clear, I wasn't considering this as a replacement for
>> >>>> your table but rather as a replacement to Charlie's compile time define
>> >>>> to skip misaligned speed probing since it is like "lpj=<x>". You can
>> >>>> specify it on command line if you want to skip the loop time detection
>> >>>> of loops per jiffies and have faster boot.
>> >>>
>> >>> Jesse sent out a patch for a kernel parameter to set the access speed to
>> >>> whatever is desired [1].
>> >>
>> >> Hey Charlie,
>> >>
>> >> Thanks but it seems you forgot to add the link ?
>> >
>> > Oops, I frequently do that...
>> >
>> > https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/
>> >
>> >>
>> >> Having configuration option + command line option seems like something
>> >> particularly heavy for such feature. The ifdefery/config options
>> >> involved in the misaligned probing code is already quite complicated. If
>> >> another mean to specify the misaligned speed access is added, I think
>> >> all configuration options to set the speed of accesses can then be
>> >> removed and just keep the command line. That will certainly simplify the
>> >> ifdef/config options.
>> >
>> > Yeah that's why it didn't get merged because it felt like overkill. I
>> > responded on the thread to Anup as why I would prefer config options. It
>> > just comes down to config options being required to enable compiler
>> > features. The kernel is only built with rv64gc and usage of all other
>> > extensions requires hand written assembly. There are easy performance
>> > gains when compiling the kernel with rv64gc_zba_zbb_zbkb etc.
>> > Performance focused kernels will need to be recompiled anyway so I am of
>> > the opinion that grouping in other performance features as config
>> > options like this is the easiest thing to do and reduces the amount of
>> > code in the kernel.
>>
>> As answered on the other thread, totally agree, except for the
>> misaligned accesses probing config options ;). Ultimately, we need
>> profiles configuration, either via defconfigs that enables a bunch of
>> optimization via ISA extension or configuration options that groups
>> these config options.
>
> I agree.
>
> Kconfig options for ISA extensions are fine but not for
> misaligned access. The kernel command-line option to skip
> misaligned access probe is fine as well.
IMO it's reasonable to have a Kconfig option that does something like
"build a kernel that assumes misaligned accesses are fast and
supported". We've got a bunch of other platform-level opt ins along
those lines (drivers, extensions, etrrata, memory stuff, etc). It would
almost certainly be a measurable performance boost to enable misaligned
accesses when building Linux, it is for every other big codebase.
> I still think RISC-V
> implementation ID based microarchitecture feature discovery
> will be eventually required as well so why not add it now.
Ya, the probing loops are nasty. We originally tried adding this to the
DT so we didn't have to probe for it, but that got shot down. IMO it's
way cleaner than having tables of vendor IDs in the kernel, maybe if
that's the backup option it'll convince the DT people to change their
minds?
> For profiles, how about having an incremental rva23.config
> similar to 64-bit.config ? This will minimize the duplication
> by re-using the same base defconfig.
The profiles don't really tell us anything, though. They're really just
marketing material, vendors just ignore the actually requirements and
stamp whatever they ship as compliant. If we start adding configs we
should just go with some sort of "support what shipped in 2024" type
definitions, that'd let us Kconfig-off some cruft for supporting old
systems while still being able to actually target something concrete.
>
> Regards,
> Anup
>
>
>
>>
>> Clément
>>
>> >
>> > - Charlie
>> >
>> >>
>> >> Clément
>> >>
>> >>>
>> >>> - Charlie
>> >>>
>> >>>> -}
>> >>>> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
>> >>>> -static void __init check_unaligned_access_speed_all_cpus(void)
>> >>>> -{
>> >>>> -}
>> >>>> -#endif
>> >>>> -
>> >>>> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
>> >>>> static void check_vector_unaligned_access(struct work_struct *work __always_unused)
>> >>>> {
>> >>>> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway
>> >>>> }
>> >>>> #endif
>> >>>>
>> >>>> +static bool check_vector_unaligned_access_table(void)
>> >>>> +{
>> >>>> + return false;
>> >>>> +}
>> >>>> +
>> >>>> static int riscv_online_cpu_vec(unsigned int cpu)
>> >>>> {
>> >>>> if (!has_vector()) {
>> >>>> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu)
>> >>>> return 0;
>> >>>> }
>> >>>>
>> >>>> + if (check_vector_unaligned_access_table())
>> >>>> + return 0;
>> >>>> +
>> >>>> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
>> >>>> if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN)
>> >>>> return 0;
>> >>>> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void)
>> >>>> {
>> >>>> int cpu;
>> >>>>
>> >>>> - if (!check_unaligned_access_emulated_all_cpus())
>> >>>> + if (!check_unaligned_access_table() &&
>> >>>> + !check_unaligned_access_emulated_all_cpus())
>> >>>> check_unaligned_access_speed_all_cpus();
>> >>>>
>> >>>> if (!has_vector()) {
>> >>>> for_each_online_cpu(cpu)
>> >>>> per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
>> >>>> - } else if (!check_vector_unaligned_access_emulated_all_cpus() &&
>> >>>> + } else if (!check_vector_unaligned_access_table() &&
>> >>>> + !check_vector_unaligned_access_emulated_all_cpus() &&
>> >>>> IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
>> >>>> kthread_run(vec_check_unaligned_access_speed_all_cpus,
>> >>>> NULL, "vec_check_unaligned_access_speed_all_cpus");
>> >>>
>> >>>>
>> >>>> Regarding your table, it feels like a bit going back to old hardcoded
>> >>>> platform description ;). I think some kind of auto-detection of speed
>> >>>> (not builtin the kernel) for platforms could be good as well to skip
>> >>>> probing.
>> >>>>
>> >>>> A DT property also seems ok to me since the goal is to describe
>> >>>> hardware. Would a common DT/ACPI property be appropriate ? The
>> >>>> device_property API unified both so if we used some common property to
>> >>>> describe the misaligned access speed (both in DT cpu node/ ACPI CPU
>> >>>> device package), we could keep a single parsing method. But I'm no ACPI
>> >>>> expert so I don't know if that really make sense.
>> >>>>
>> >>>> Thanks,
>> >>>>
>> >>>> Clément
>> >>>>
>> >>>>>
>> >>>>> Thanks,
>> >>>>> drew
>> >>>>
>> >>
>>
Powered by blists - more mailing lists