lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <8736lyrzmh.fsf@linux.ibm.com>
Date:   Wed, 01 May 2019 09:57:42 -0500
From:   Nathan Lynch <nathanl@...ux.ibm.com>
To:     Thiago Jung Bauermann <bauerman@...ux.ibm.com>
Cc:     linuxppc-dev@...ts.ozlabs.org,
        Gautham R Shenoy <ego@...ux.vnet.ibm.com>,
        linux-kernel@...r.kernel.org, Nicholas Piggin <npiggin@...il.com>,
        Michael Bringmann <mwb@...ux.vnet.ibm.com>,
        Tyrel Datwyler <tyreld@...ux.vnet.ibm.com>,
        Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>
Subject: Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU

Hi Thiago,

Thiago Jung Bauermann <bauerman@...ux.ibm.com> writes:
> Nathan Lynch <nathanl@...ux.ibm.com> writes:
>> Thiago Jung Bauermann <bauerman@...ux.ibm.com> writes:
>>> +		while (true) {
>>>  			cpu_status = smp_query_cpu_stopped(pcpu);
>>>  			if (cpu_status == QCSS_STOPPED ||
>>>  			    cpu_status == QCSS_HARDWARE_ERROR)
>>>  				break;
>>> -			cpu_relax();
>>> +			udelay(100);
>>>  		}
>>>  	}
>>
>> I agree with looping indefinitely but doesn't it need a cond_resched()
>> or similar check?
>
> If there's no kernel or hypervisor bug, it shouldn't take more than a
> few tens of ms for this loop to complete (Gautham measured a maximum of
> 10 ms on a POWER9 with an earlier version of this patch).

10ms is twice the default scheduler quantum...


> In case of bugs related to CPU hotplug (either in the kernel or the
> hypervisor), I was hoping that the resulting lockup warnings would be a
> good indicator that something is wrong. :-)

Not convinced we should assume something is wrong if it takes a few
dozen ms to complete the operation. AFAIK we don't have any guarantees
about the maximum latency of stop-self, and it can be affected by other
activity in the system, whether we're in shared processor mode, etc. Not
to mention smp_query_cpu_stopped has to acquire the global RTAS lock and
be serialized with other tasks calling into RTAS. So I am concerned
about generating spurious warnings here.

If for whatever reason the operation is taking too long, drmgr or
whichever application is initiating the change will appear to stop
making progress. It's not too hard to find out what's going on with
facilities like perf or /proc/pid/stack.


> Though perhaps adding a cond_resched() every 10 ms or so, with a
> WARN_ON() if it loops for more than 50 ms would be better.

A warning doesn't seem appropriate to me, and cond_resched should be
invoked in each iteration. Or just msleep(1) in each iteration would be
fine, I think.

But I'd like to bring in some more context -- here is the body of
pseries_cpu_die:

static void pseries_cpu_die(unsigned int cpu)
{
	int tries;
	int cpu_status = 1;
	unsigned int pcpu = get_hard_smp_processor_id(cpu);

	if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
		cpu_status = 1;
		for (tries = 0; tries < 5000; tries++) {
			if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
				cpu_status = 0;
				break;
			}
			msleep(1);
		}
	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {

		for (tries = 0; tries < 25; tries++) {
			cpu_status = smp_query_cpu_stopped(pcpu);
			if (cpu_status == QCSS_STOPPED ||
			    cpu_status == QCSS_HARDWARE_ERROR)
				break;
			cpu_relax();
		}
}

This patch alters the behavior of the second loop (the CPU_STATE_OFFLINE
branch). The CPU_STATE_INACTIVE branch is used when the offline behavior
is to use H_CEDE instead of stop-self, correct?

And isn't entering H_CEDE expected to be quite a bit faster than
stop-self? If so, why does that path get five whole seconds[*] while
we're bikeshedding about tens of milliseconds for stop-self? :-)

[*] And should it be made to retry indefinitely as well?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ