linux-hardening - Re: [PATCH] arm64: smp: smp_send_stop() and crash_smp_send

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 17 May 2024 13:01:51 -0700
From: Doug Anderson <dianders@...omium.org>
To: Will Deacon <will@...nel.org>
Cc: Catalin Marinas <catalin.marinas@....com>, Mark Rutland <mark.rutland@....com>, 
	Marc Zyngier <maz@...nel.org>, Misono Tomohiro <misono.tomohiro@...itsu.com>, 
	Chen-Yu Tsai <wens@...e.org>, Stephen Boyd <swboyd@...omium.org>, 
	Daniel Thompson <daniel.thompson@...aro.org>, Sumit Garg <sumit.garg@...aro.org>, 
	Frederic Weisbecker <frederic@...nel.org>, "Guilherme G. Piccoli" <gpiccoli@...lia.com>, 
	Josh Poimboeuf <jpoimboe@...nel.org>, Kees Cook <keescook@...omium.org>, 
	Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>, 
	Tony Luck <tony.luck@...el.com>, Valentin Schneider <vschneid@...hat.com>, 
	linux-arm-kernel@...ts.infradead.org, linux-hardening@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] arm64: smp: smp_send_stop() and crash_smp_send_stop()
 should try non-NMI first

Hi,

On Fri, Apr 12, 2024 at 6:55 AM Will Deacon <will@...nel.org> wrote:
>
> Hi Doug,
>
> I'm doing some inbox Spring cleaning!

No worries. I got your reply while I was on a bunch of business travel
and finally cleared stuff out enough to take a look again. ;-)


> On Thu, Dec 07, 2023 at 05:02:56PM -0800, Douglas Anderson wrote:
> > When testing hard lockup handling on my sc7180-trogdor-lazor device
> > with pseudo-NMI enabled, with serial console enabled and with kgdb
> > disabled, I found that the stack crawls printed to the serial console
> > ended up as a jumbled mess. After rebooting, the pstore-based console
> > looked fine though. Also, enabling kgdb to trap the panic made the
> > console look fine and avoided the mess.
> >
> > After a bit of tracking down, I came to the conclusion that this was
> > what was happening:
> > 1. The panic path was stopping all other CPUs with
> >    panic_other_cpus_shutdown().
> > 2. At least one of those other CPUs was in the middle of printing to
> >    the serial console and holding the console port's lock, which is
> >    grabbed with "irqsave". ...but since we were stopping with an NMI
> >    we didn't care about the "irqsave" and interrupted anyway.
> > 3. Since we stopped the CPU while it was holding the lock it would
> >    never release it.
> > 4. All future calls to output to the console would end up failing to
> >    get the lock in qcom_geni_serial_console_write(). This isn't
> >    _totally_ unexpected at panic time but it's a code path that's not
> >    well tested, hard to get right, and apparently doesn't work
> >    terribly well on the Qualcomm geni serial driver.
> >
> > It would probably be a reasonable idea to try to make the Qualcomm
> > geni serial driver work better, but also it's nice not to get into
> > this situation in the first place.
> >
> > Taking a page from what x86 appears to do in native_stop_other_cpus(),
> > let's do this:
> > 1. First, we'll try to stop other CPUs with a normal IPI and wait a
> >    second. This gives them a chance to leave critical sections.
> > 2. If CPUs fail to stop then we'll retry with an NMI, but give a much
> >    lower timeout since there's no good reason for a CPU not to react
> >    quickly to a NMI.
> >
> > This works well and avoids the corrupted console and (presumably)
> > could help avoid other similar issues.
> >
> > In order to do this, we need to do a little re-organization of our
> > IPIs since we don't have any more free IDs. We'll do what was
> > suggested in previous conversations and combine "stop" and "crash
> > stop". That frees up an IPI so now we can have a "stop" and "stop
> > NMI".
> >
> > In order to do this we also need a slight change in the way we keep
> > track of which CPUs still need to be stopped. We need to know
> > specifically which CPUs haven't stopped yet when we fall back to NMI
> > but in the "crash stop" case the "cpu_online_mask" isn't updated as
> > CPUs go down. This is why that code path had an atomic of the number
> > of CPUs left. We'll solve this by making the cpumask into a
> > global. This has a potential memory implication--with NR_CPUs = 4096
> > this is 4096/8 = 512 bytes of globals. On the upside in that same case
> > we take 512 bytes off the stack which could potentially have made the
> > stop code less reliable. It can be noted that the NMI backtrace code
> > (lib/nmi_backtrace.c) uses the same approach and that use also
> > confirms that updating the mask is safe from NMI.
>
> Updating the global masks without any synchronisation feels broken though:
>
> > @@ -1085,77 +1080,75 @@ void smp_send_stop(void)
> >  {
> >       unsigned long timeout;
> >
> > -     if (num_other_online_cpus()) {
> > -             cpumask_t mask;
> > +     /*
> > +      * If this cpu is the only one alive at this point in time, online or
> > +      * not, there are no stop messages to be sent around, so just back out.
> > +      */
> > +     if (num_other_online_cpus() == 0)
> > +             goto skip_ipi;
> >
> > -             cpumask_copy(&mask, cpu_online_mask);
> > -             cpumask_clear_cpu(smp_processor_id(), &mask);
> > +     cpumask_copy(to_cpumask(stop_mask), cpu_online_mask);
> > +     cpumask_clear_cpu(smp_processor_id(), to_cpumask(stop_mask));
>
> I don't see what prevents multiple CPUs getting in here concurrently and
> tripping over the masks. x86 seems to avoid that with an atomic
> 'stopping_cpu' variable in native_stop_other_cpus(). Do we need something
> similar?

Good point. nmi_trigger_cpumask_backtrace(), which my code was based
on, has a test_and_set() for this and that seems simpler than the
atomic_try_cmpxchg() from the x86 code.

If we run into that case, what do you think we should do? I guess x86
just does a "return", though it feels like at least a warning should
be printed since we're not doing what the function asked us to do.
When we return there will be other CPUs running.

In theory, we could try to help the other processor along? I don't
know how much complexity to handle here and I could imagine that
testing some of the corner cases would be extremely hard. I could
imagine that this might work but maybe it's too complex?

--

void smp_send_stop(void)
{
    unsigned long timeout;
    static unsigned long stop_in_progress;

    /*
     * If this cpu is the only one alive at this point in time, online or
     * not, there are no stop messages to be sent around, so just back out.
     */
    if (num_other_online_cpus() == 0)
        goto skip_ipi;

    /*
     * If another is already trying to stop and we're here then either the
     * other CPU hasn't sent us the IPI yet or we have interrupts disabled.
     * Let's help the other CPU by stopping ourselves.
     */
    if (test_and_set_bit(0, &stop_in_progress)) {
        /* Wait until the other inits stop_mask */
        while (!test_bit(1, &stop_in_progress)) {
            cpu_relax();
            smp_rmb();
        }
        do_handle_IPI(IPI_CPU_STOP);
    }

    cpumask_copy(to_cpumask(stop_mask), cpu_online_mask);
    cpumask_clear_cpu(smp_processor_id(), to_cpumask(stop_mask));

    /* Indicate that we've initted stop_mask */
    set_bit(1, &stop_in_progress);
    smp_wmb();
    ...
    ...

--

Opinions?


> Apart from that, I'm fine with the gist of the patch.

Great. Ironically as I reviewed this patch with fresh eyes and looking
at the things you brought up, I also found a few issues, I'll respond
to my post myself so I have context to respond to.

One other question: what did you think about Daniel's suggestion to go
straight to NMI for crash_stop? I don't feel like I have enough
experience with crash_stop to have intuition here, but it feels like
trying IRQ first is still better in that case, but I'm happy to go
either way.

-Doug