[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <1556752043.jyg2z3kgaw.astroid@bobo.none>
Date: Thu, 02 May 2019 09:12:34 +1000
From: Nicholas Piggin <npiggin@...il.com>
To: Thiago Jung Bauermann <bauerman@...ux.ibm.com>,
Nathan Lynch <nathanl@...ux.ibm.com>
Cc: Gautham R Shenoy <ego@...ux.vnet.ibm.com>,
linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
Michael Bringmann <mwb@...ux.vnet.ibm.com>,
Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
Tyrel Datwyler <tyreld@...ux.vnet.ibm.com>
Subject: Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU
Nathan Lynch's on May 2, 2019 12:57 am:
> Hi Thiago,
>
> Thiago Jung Bauermann <bauerman@...ux.ibm.com> writes:
>> Nathan Lynch <nathanl@...ux.ibm.com> writes:
>>> Thiago Jung Bauermann <bauerman@...ux.ibm.com> writes:
>>>> + while (true) {
>>>> cpu_status = smp_query_cpu_stopped(pcpu);
>>>> if (cpu_status == QCSS_STOPPED ||
>>>> cpu_status == QCSS_HARDWARE_ERROR)
>>>> break;
>>>> - cpu_relax();
>>>> + udelay(100);
>>>> }
>>>> }
>>>
>>> I agree with looping indefinitely but doesn't it need a cond_resched()
>>> or similar check?
>>
>> If there's no kernel or hypervisor bug, it shouldn't take more than a
>> few tens of ms for this loop to complete (Gautham measured a maximum of
>> 10 ms on a POWER9 with an earlier version of this patch).
>
> 10ms is twice the default scheduler quantum...
>
>
>> In case of bugs related to CPU hotplug (either in the kernel or the
>> hypervisor), I was hoping that the resulting lockup warnings would be a
>> good indicator that something is wrong. :-)
>
> Not convinced we should assume something is wrong if it takes a few
> dozen ms to complete the operation.
Right, and if there is no kernel or hypervisor bug then it will stop
eventually :)
> AFAIK we don't have any guarantees
> about the maximum latency of stop-self, and it can be affected by other
> activity in the system, whether we're in shared processor mode, etc. Not
> to mention smp_query_cpu_stopped has to acquire the global RTAS lock and
> be serialized with other tasks calling into RTAS. So I am concerned
> about generating spurious warnings here.
Agreed.
>
> If for whatever reason the operation is taking too long, drmgr or
> whichever application is initiating the change will appear to stop
> making progress. It's not too hard to find out what's going on with
> facilities like perf or /proc/pid/stack.
>
>
>> Though perhaps adding a cond_resched() every 10 ms or so, with a
>> WARN_ON() if it loops for more than 50 ms would be better.
>
> A warning doesn't seem appropriate to me, and cond_resched should be
> invoked in each iteration. Or just msleep(1) in each iteration would be
> fine, I think.
>
> But I'd like to bring in some more context -- here is the body of
> pseries_cpu_die:
>
> static void pseries_cpu_die(unsigned int cpu)
> {
> int tries;
> int cpu_status = 1;
> unsigned int pcpu = get_hard_smp_processor_id(cpu);
>
> if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
> cpu_status = 1;
> for (tries = 0; tries < 5000; tries++) {
> if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
> cpu_status = 0;
> break;
> }
> msleep(1);
> }
> } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
>
> for (tries = 0; tries < 25; tries++) {
> cpu_status = smp_query_cpu_stopped(pcpu);
> if (cpu_status == QCSS_STOPPED ||
> cpu_status == QCSS_HARDWARE_ERROR)
> break;
> cpu_relax();
> }
> }
>
> This patch alters the behavior of the second loop (the CPU_STATE_OFFLINE
> branch). The CPU_STATE_INACTIVE branch is used when the offline behavior
> is to use H_CEDE instead of stop-self, correct?
>
> And isn't entering H_CEDE expected to be quite a bit faster than
> stop-self? If so, why does that path get five whole seconds[*] while
> we're bikeshedding about tens of milliseconds for stop-self? :-)
>
> [*] And should it be made to retry indefinitely as well?
I think so.
Thanks,
Nick
Powered by blists - more mailing lists