lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALs-HsvX=zHvSKV5Skuxk=7jWq_mGqhsRSeuefD7OiYOCr3Gkw@mail.gmail.com>
Date:   Tue, 7 Nov 2023 09:26:03 -0800
From:   Evan Green <evan@...osinc.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     Palmer Dabbelt <palmer@...osinc.com>,
        Jisheng Zhang <jszhang@...nel.org>,
        David Laight <David.Laight@...lab.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Andrew Jones <ajones@...tanamicro.com>,
        Anup Patel <apatel@...tanamicro.com>,
        Clément Léger <cleger@...osinc.com>,
        Conor Dooley <conor.dooley@...rochip.com>,
        Greentime Hu <greentime.hu@...ive.com>,
        Heiko Stuebner <heiko@...ech.de>,
        Ley Foon Tan <leyfoon.tan@...rfivetech.com>,
        Marc Zyngier <maz@...nel.org>,
        Palmer Dabbelt <palmer@...belt.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Sunil V L <sunilvl@...tanamicro.com>,
        linux-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org
Subject: Re: [PATCH v3] RISC-V: Probe misaligned access speed in parallel

On Tue, Nov 7, 2023 at 12:34 AM Sebastian Andrzej Siewior
<bigeasy@...utronix.de> wrote:
>
> On 2023-11-06 14:58:55 [-0800], Evan Green wrote:
> > Probing for misaligned access speed takes about 0.06 seconds. On a
> > system with 64 cores, doing this in smp_callin() means it's done
> > serially, extending boot time by 3.8 seconds. That's a lot of boot time.
> >
> > Instead of measuring each CPU serially, let's do the measurements on
> > all CPUs in parallel. If we disable preemption on all CPUs, the
> > jiffies stop ticking, so we can do this in stages of 1) everybody
> > except core 0, then 2) core 0. The allocations are all done outside of
> > on_each_cpu() to avoid calling alloc_pages() with interrupts disabled.
> >
> > For hotplugged CPUs that come in after the boot time measurement,
> > register CPU hotplug callbacks, and do the measurement there. Interrupts
> > are enabled in those callbacks, so they're fine to do alloc_pages() in.
>
> I think this is dragged out of proportion. I would do this (if needed
> can can't be identified by CPU-ID or so) on boot CPU only. If there is
> evidence/ proof/ blessing from the high RiscV council that different
> types of CPU cores are mixed together then this could be extended.
> You brought Big-Little up in the other thread. This is actually known.
> Same as with hyper-threads on x86, you know which CPU is the core and
> which hyper thread (CPU) belongs to it.
> So in terms of BigLittle you _could_ limit this to one Big and one
> Little core instead running it on all.

Doing it on one per cluster might also happen to work, but I still see
nothing that prevents variety within a cluster, so I'm not comfortable
with that assumption. It also doesn't buy much. I'm not sure what kind
of guidance RVI is providing on integrating multiple CPUs into a
system. I haven't seen any myself, but am happy to reassess if there's
documentation banning the scenarios I'm imagining.

>
> But this is just my few on this. From PREEMPT_RT's point of view, the
> way you restructured the memory allocation should work now.

Thanks!

>
> > Reported-by: Jisheng Zhang <jszhang@...nel.org>
> > Closes: https://lore.kernel.org/all/mhng-9359993d-6872-4134-83ce-c97debe1cf9a@palmer-ri-x1c9/T/#mae9b8f40016f9df428829d33360144dc5026bcbf
> > Fixes: 584ea6564bca ("RISC-V: Probe for unaligned access speed")
> > Signed-off-by: Evan Green <evan@...osinc.com>
> >
> >
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 6a01ded615cd..fe59e18dbd5b 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> …
> >
> > -static int __init check_unaligned_access_boot_cpu(void)
> > +/* Measure unaligned access on all CPUs present at boot in parallel. */
> > +static int check_unaligned_access_all_cpus(void)
> >  {
> > -     check_unaligned_access(0);
> > +     unsigned int cpu;
> > +     unsigned int cpu_count = num_possible_cpus();
> > +     struct page **bufs = kzalloc(cpu_count * sizeof(struct page *),
> > +                                  GFP_KERNEL);
>
> kcalloc(). For beauty reasons you could try a reverse xmas tree.
>
> > +
> > +     if (!bufs) {
> > +             pr_warn("Allocation failure, not measuring misaligned performance\n");
> > +             return 0;
> > +     }
> > +
> > +     /*
> > +      * Allocate separate buffers for each CPU so there's no fighting over
> > +      * cache lines.
> > +      */
> > +     for_each_cpu(cpu, cpu_online_mask) {
> > +             bufs[cpu] = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
> > +             if (!bufs[cpu]) {
> > +                     pr_warn("Allocation failure, not measuring misaligned performance\n");
> > +                     goto out;
> > +             }
> > +     }
> > +
> > +     /* Check everybody except 0, who stays behind to tend jiffies. */
> > +     on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1);
>
> comments! _HOW_ do you ensure that CPU0 is left out? You don't. CPU0
> does this and the leaves which is a waste. Using on_each_cpu_cond()
> could deal with this. And you have the check within the wrapper
> (check_unaligned_access_nonboot_cpu()) anyway.
>
> > +     /* Check core 0. */
> > +     smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
>
> Now that comment is obvious. If you want to add a comment, why not state
> why CPU0 has to be done last?
>
> > +
> > +     /* Setup hotplug callback for any new CPUs that come online. */
> > +     cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online",
> > +                               riscv_online_cpu, NULL);
> Instead riscv:online you could use riscv:unaliged_check or something
> that pin points the callback to something obvious. This is exported via
> sysfs.
>
> Again, comment is obvious. For that to make sense would require RiscV to
> support physical-hotplug. For KVM like environment (where you can plug in
> CPUs later) this probably doesn't make sense at all. Why not? Because
>
> - without explicit CPU pinning your slow/ fast CPU mapping (host <->
>   guest) could change if the scheduler on the host moves the threads
>   around.

Taking a system with non-identical cores and allowing vcpus to bounce
between them sounds like a hypervisor configuration issue to me,
regardless of this patch.

>
> - without explicit task offload and resource partitioning on the host
>   your guest thread might get interrupt during the measurement. This is
>   done during boot so chances are high that it runs 100% of its time
>   slice and will be preempted once other tasks on the host ask for CPU
>   run time.

The measurement takes the best (lowest time) iteration. So unless
every iteration gets interrupted, I should get a good read in there
somewhere.
-Evan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ