linux-kernel - Re: [PATCH] powerpc: on crash, kexec'ed kernel needs all CPUs are online

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5620ABC5.6060300@redhat.com>
Date:	Fri, 16 Oct 2015 09:48:21 +0200
From:	Laurent Vivier <lvivier@...hat.com>
To:	Michael Ellerman <mpe@...erman.id.au>
Cc:	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Paul Mackerras <paulus@...ba.org>,
	linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org,
	dgibson@...hat.com, thuth@...hat.com
Subject: Re: [PATCH] powerpc: on crash, kexec'ed kernel needs all CPUs are
 online

On 16/10/2015 04:14, Michael Ellerman wrote:
> On Thu, 2015-10-15 at 21:00 +0200, Laurent Vivier wrote:
>> On kexec, all secondary offline CPUs are onlined before
>> starting the new kernel, this is not done in the case of kdump.
>>
>> If kdump is configured and a kernel crash occurs whereas
>> some secondaries CPUs are offline (SMT=off),
>> the new kernel is not able to start them and displays some
>> "Processor X is stuck.".
> 
> Do we know why they are stuck?

Yes, we know :)

On the crash, as the CPUs are offline, kernel doesn't call
opal_return_cpu(), so for OPAL all these CPU are always in the kernel.

When the new kernel starts, it call s opal_query_cpu_status() to know
which CPUs are available. As they were not returned to OPAL these CPUs
are not available, but as the kernel logic relies on the fact they must
be available (the logic is SMT is on), it is waiting for their starting
and wait for ever...

When the kernel starts, all secondary processors are started by a call
for each of them of __cpu_up():

__cpu_up()

    ...
    cpu_callin_map[cpu] = 0;
    ...
    rc = smp_ops->kick_cpu(cpu);
    ...wait...
    if (!cpu_callin_map[cpu]) {
                printk(KERN_ERR "Processor %u is stuck.\n", cpu);
    ...

on powernv, kick_cpu() is pnv_smp_kick_cpu():

pnv_smp_kick_cpu()

    ...
    unsigned long start_here =

__pa(ppc_function_entry(generic_secondary_smp_init));
    ...
    /*
     * Already started, just kick it, probably coming from
     * kexec and spinning
     */
    rc = opal_query_cpu_status(pcpu, &status);
    ...
    if (status == OPAL_THREAD_STARTED)
        goto kick;
    ...
    rc = opal_start_cpu(pcpu, start_here);
    ...
kick:
    ...

generic_secondary_smp_init() is a function in assembly language that
calls in the end start_secondary() :

start_secondary()

    ...
    cpu_callin_map[cpu] = 1;
    ...

So processors are stucked because start_secondary() is never called.

start_secondary() is never called because OPAL cpu status is
OPAL_THREAD_STARTED.

Secondary CPUs are in "OPAL_THREAD_STARTED" state because they have not
been returned to OPAL on crash.

CPUs are returned to OPAL by pnv_kexec_cpu_down() which is called by
crash_ipi_callback() (for secondary cpus)... except if the cpu is not
online.

As the CPUs are offline, they are not returned to OPAL, and then kernel
can't restart them.

> I really don't like this fix. The reason we're doing a kdump is because the
> first kernel has panicked, possibly with locks held or data structures
> corrupted. Calling cpu_up() then goes and tries to run a bunch of code in the
> crashed kernel, which increases the chance of us just wedging completely.

I agree, but the whole logic of the POWER kernel is we have all the
threads available.

Moreover the kernel parameter "maxcpus" is ignored if it is not a
multiple of thread per core:

...
static int subcore_init(void)
{
        if (!cpu_has_feature(CPU_FTR_ARCH_207S))
                return 0;

        /*
         * We need all threads in a core to be present to split/unsplit so
         * continue only if max_cpus are aligned to threads_per_core.
         */
        if (setup_max_cpus % threads_per_core)
                return 0;

...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/