[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6pnTG7YoiDW_Gkq@ghost>
Date: Mon, 10 Feb 2025 12:53:32 -0800
From: Charlie Jenkins <charlie@...osinc.com>
To: Clément Léger <cleger@...osinc.com>
Cc: Andrew Jones <ajones@...tanamicro.com>,
Anup Patel <anup@...infault.org>, linux-riscv@...ts.infradead.org,
linux-kernel@...r.kernel.org, paul.walmsley@...ive.com,
"palmer@...belt.com Anup Patel" <apatel@...tanamicro.com>
Subject: Re: [PATCH 7/9] riscv: Prepare for unaligned access type table
lookups
On Mon, Feb 10, 2025 at 09:42:25PM +0100, Clément Léger wrote:
>
>
> On 10/02/2025 18:20, Charlie Jenkins wrote:
> > On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote:
> >>
> >>
> >> On 10/02/2025 15:06, Andrew Jones wrote:
> >>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote:
> >>>>
> >>>>
> >>>> On 10/02/2025 11:16, Anup Patel wrote:
> >>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@...osinc.com> wrote:
> >>>>>>
> >>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
> >>>>>>> Probing unaligned accesses on boot is time consuming. Provide a
> >>>>>>> function which will be used to look up the access type in a table
> >>>>>>> by id registers. Vendors which provide table entries can then skip
> >>>>>>> the probing.
> >>>>>>
> >>>>>> The access checker in my experience is only time consuming on slow
> >>>>>> hardware. Hardware that supports fast unaligned accesses isn't really
> >>>>>> impacted by this? Avoiding a list of hardware that has slow/fast
> >>>>>> unaligned accesses in the kernel was the main reason for dynamically
> >>>>>> checking. We did introduce the config option to compile the kernel with
> >>>>>> assumed slow/fast accesses, which of course has the downside of
> >>>>>> recompiling the kernel and I assume that you already considered that.
> >>>>>
> >>>>> The kconfig option does not align with the vision of running the same
> >>>>> kernel image across platforms.
> >>>>
> >>>> I'd would be advocating to remove compile time options as well and use
> >>>> another way to skip the probe (see below).
> >>>>
> >>>>>
> >>>>>>
> >>>>>> Instead of having a table in the kernel, something that would be more
> >>>>>> platform agnostic would be to have an extension that signals this
> >>>>>> information. That seems like it would accomplish the same goal and
> >>>>>> leverage the existing infrastructure in the kernel, albeit with the need
> >>>>>> to make a new extension.
> >>>>>>
> >>>>>
> >>>>> IMO, expecting an ISA extension to be defined for all possible
> >>>>> microarchitectural choices is not going to scale so it is better
> >>>>> to have infrastructure in kernel itself to infer microarchitectural
> >>>>> choices based on RISC-V implementation ID.
> >>>>
> >>>> Since adding an extension seems quite unlikely, and that a device-tree
> >>>> property is likely DT centric and not applicable to ACPI as well, was a
> >>>> command line argument considered ?
> >>>>
> >>>
> >>> I did consider adding a command line option in addition to the table,
> >>> allowing platforms which neither have a table entry [yet] nor want to do
> >>> the speed test, to set whatever they like. In the end, I dropped it, since
> >>> I don't have a use case at this time. However, if we really don't want a
> >>> table, then I can look into the command line option instead.
> >>
> >> Sorry if I wasn't clear, I wasn't considering this as a replacement for
> >> your table but rather as a replacement to Charlie's compile time define
> >> to skip misaligned speed probing since it is like "lpj=<x>". You can
> >> specify it on command line if you want to skip the loop time detection
> >> of loops per jiffies and have faster boot.
> >
> > Jesse sent out a patch for a kernel parameter to set the access speed to
> > whatever is desired [1].
>
> Hey Charlie,
>
> Thanks but it seems you forgot to add the link ?
Oops, I frequently do that...
https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/
>
> Having configuration option + command line option seems like something
> particularly heavy for such feature. The ifdefery/config options
> involved in the misaligned probing code is already quite complicated. If
> another mean to specify the misaligned speed access is added, I think
> all configuration options to set the speed of accesses can then be
> removed and just keep the command line. That will certainly simplify the
> ifdef/config options.
Yeah that's why it didn't get merged because it felt like overkill. I
responded on the thread to Anup as why I would prefer config options. It
just comes down to config options being required to enable compiler
features. The kernel is only built with rv64gc and usage of all other
extensions requires hand written assembly. There are easy performance
gains when compiling the kernel with rv64gc_zba_zbb_zbkb etc.
Performance focused kernels will need to be recompiled anyway so I am of
the opinion that grouping in other performance features as config
options like this is the easiest thing to do and reduces the amount of
code in the kernel.
- Charlie
>
> Clément
>
> >
> > - Charlie
> >
> >> -}
> >> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
> >> -static void __init check_unaligned_access_speed_all_cpus(void)
> >> -{
> >> -}
> >> -#endif
> >> -
> >> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
> >> static void check_vector_unaligned_access(struct work_struct *work __always_unused)
> >> {
> >> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway
> >> }
> >> #endif
> >>
> >> +static bool check_vector_unaligned_access_table(void)
> >> +{
> >> + return false;
> >> +}
> >> +
> >> static int riscv_online_cpu_vec(unsigned int cpu)
> >> {
> >> if (!has_vector()) {
> >> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu)
> >> return 0;
> >> }
> >>
> >> + if (check_vector_unaligned_access_table())
> >> + return 0;
> >> +
> >> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
> >> if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN)
> >> return 0;
> >> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void)
> >> {
> >> int cpu;
> >>
> >> - if (!check_unaligned_access_emulated_all_cpus())
> >> + if (!check_unaligned_access_table() &&
> >> + !check_unaligned_access_emulated_all_cpus())
> >> check_unaligned_access_speed_all_cpus();
> >>
> >> if (!has_vector()) {
> >> for_each_online_cpu(cpu)
> >> per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
> >> - } else if (!check_vector_unaligned_access_emulated_all_cpus() &&
> >> + } else if (!check_vector_unaligned_access_table() &&
> >> + !check_vector_unaligned_access_emulated_all_cpus() &&
> >> IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
> >> kthread_run(vec_check_unaligned_access_speed_all_cpus,
> >> NULL, "vec_check_unaligned_access_speed_all_cpus");
> >
> >>
> >> Regarding your table, it feels like a bit going back to old hardcoded
> >> platform description ;). I think some kind of auto-detection of speed
> >> (not builtin the kernel) for platforms could be good as well to skip
> >> probing.
> >>
> >> A DT property also seems ok to me since the goal is to describe
> >> hardware. Would a common DT/ACPI property be appropriate ? The
> >> device_property API unified both so if we used some common property to
> >> describe the misaligned access speed (both in DT cpu node/ ACPI CPU
> >> device package), we could keep a single parsing method. But I'm no ACPI
> >> expert so I don't know if that really make sense.
> >>
> >> Thanks,
> >>
> >> Clément
> >>
> >>>
> >>> Thanks,
> >>> drew
> >>
>
Powered by blists - more mailing lists