linux-kernel - Re: Out-of-bounds access when hartid >= NR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOnJCUJHXcaLve3Nei+PnnA1-kKUXjShDneK5qNh5ebVnnWMXw@mail.gmail.com>
Date:   Wed, 27 Oct 2021 18:28:42 -0700
From:   Atish Patra <atishp@...shpatra.org>
To:     Geert Uytterhoeven <geert@...ux-m68k.org>
Cc:     Paul Walmsley <paul.walmsley@...ive.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        linux-riscv <linux-riscv@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

On Tue, Oct 26, 2021 at 2:03 AM Geert Uytterhoeven <geert@...ux-m68k.org> wrote:
>
> Hi Atish,
>
> On Tue, Oct 26, 2021 at 10:55 AM Atish Patra <atishp@...shpatra.org> wrote:
> > On Mon, Oct 25, 2021 at 8:54 AM Geert Uytterhoeven <geert@...ux-m68k.org> wrote:
> > > When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
> > > the 4th CPU either fails to come online, or the system crashes.
> > >
> > > This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
> > > and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
> > >   - unused core has hartid 0 (sifive,e51),
> > >   - processor 0 has hartid 1 (sifive,u74-mc),
> > >   - processor 1 has hartid 2 (sifive,u74-mc),
> > >   - processor 2 has hartid 3 (sifive,u74-mc),
> > >   - processor 3 has hartid 4 (sifive,u74-mc).
> > >
> > > I assume the same issue is present on the SiFive fu540 and fu740
> > > SoCs, but I don't have access to these.  The issue is not present
> > > on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
> > > hartid 0.
> > >
> > > arch/riscv/kernel/cpu_ops.c has:
> > >
> > >     void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> > >     void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
> > >
> > >     void cpu_update_secondary_bootdata(unsigned int cpuid,
> > >                                        struct task_struct *tidle)
> > >     {
> > >             int hartid = cpuid_to_hartid_map(cpuid);
> > >
> > >             /* Make sure tidle is updated */
> > >             smp_mb();
> > >             WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> > >                        task_stack_page(tidle) + THREAD_SIZE);
> > >             WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
> > >
> > > The above two writes cause out-of-bound accesses beyond
> > > __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
> > >
> > >     }
> > >
> >
> > Thanks for reporting this. We need to fix this and definitely shouldn't hide it
> > using configs. I guess I never tested with lower values (2 or 4) for
> > CONFIG_NR_CPUS which explains how this bug was not noticed until now.
>
> > > How to fix this?
> > >
> > > We could skip hartids >= NR_CPUS, but that feels strange to me, as
> > > you need NR_CPUS to be larger (much larger if the first usable hartid
> > > is a large number) than the number of CPUs used.
> > >
> > > We could store the minimum hartid, and always subtract that when
> > > accessing __cpu_up_{stack,pointer}_pointer[] (also in
> > > arch/riscv/kernel/head.S), but that means unused cores cannot be in the
> > > middle of the hartid range.
> >
> > Yeah. Both of the above proposed solutions are not ideal.
> >
> > >
> > > Are hartids guaranteed to be continuous? If not, we have no choice but
> > > to index __cpu_up_{stack,pointer}_pointer[] by cpuid instead, which
> > > needs a more expensive conversion in arch/riscv/kernel/head.S.
> >
> > This will work for ordered booting with SBI HSM extension. However, it may
> > fail for spinwait booting because cpuid_to_hartid_map might not have setup
> > depending on when secondary harts are jumping to linux.
> >
> > Ideally, the size of the __cpu_up_{stack,task}_pointer[] should be the maximum
> > hartid possible. How about adding a config for that ?
>
> (reading more RISC-V specs)
> Hart IDs can use up to XLEN (32, 64, or 128) bits. So creative sparse
> multi-level encodings like used in MPIDR on ARM[1] makes using a
> simple array infeasible.
>

Hmm. Should we worry about similar creative sparse encodings when it appears ?
Maybe we can dodge it all together.

The other approach would be to go with your proposed solution to
convert the hartid it to the cpuid in head.S
However, this can only be fixed for ordered booting. Most of today's
users have probably moved on to ordered booting.
The only user who would be using spinwait would be

1. whoever still uses BBL
2. whoever still uses OpenSBI v0.6 or older

Maybe we can document this bug in the Linux kernel for the spinwait
method and move on.
Hopefully, we can remove the spinwait method in a couple of years.

Is that acceptable ?

> [1] arch/arm{,64}/include/asm/cputype.h
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@...ux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds



-- 
Regards,
Atish