[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c2cf5125-2545-c325-0393-0dba4aab379d@cn.fujitsu.com>
Date: Tue, 22 Jan 2019 16:42:19 +0800
From: Su Yue <suy.fnst@...fujitsu.com>
To: <paulmck@...ux.ibm.com>
CC: <linux-kernel@...r.kernel.org>, <josh@...htriplett.org>,
<rostedt@...dmis.org>, <mathieu.desnoyers@...icios.com>,
<jiangshanlai@...il.com>, "Li, Philip" <philip.li@...el.com>,
<lkp-developer@...ists.intel.com>
Subject: Re: rcutorture: meaning of "End of test: RCU_HOTPLUG"
Thanks for your quick reply! Paul
On 1/22/19 12:01 PM, Paul E. McKenney wrote:
> On Tue, Jan 22, 2019 at 11:40:53AM +0800, Su Yue wrote:
>> Hi, guys
>> While running rcutorture tests with "onoff_interval", some tests
>> failed and results show like:
>>
>> =====================================================================
>> [ 316.354501] srcud-torture:--- End of test: RCU_HOTPLUG:
>> nreaders=1 nfakewriters=4 stat_interval=60 verbose=2
>> test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fq\
>> s_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0
>> test_boost_interval=7 test_boost_duration=4 shutdown_secs=0
>> stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 n_ba\
>> rrier_cbs=0 onoff_interval=3 onoff_holdoff=0
>> ====================================================================
>>
>> I am wondering that meaning of "RCU_HOTPLUG". Is it expected because
>> cpu hotplug is enabled in the test? Or just represents another type of
>> failure?
>
> This says that at least one CPU hotplug operation failed, that is,
> the CPU didn't actually come online or go offline as requested. If you
> are introducing CPU hotplug to an architecture, this usually indicates
> that you have bugs in your CPU-hotplug code. Or it nmight be that
It should hit the case since there is no RCU CPU stall warnings.
> RCU grace periods failed to progress -- though this would normally
> also result in RCU CPU stall warnings.
>
> There should be lines containing "ver:" in your console output. What
> does one of the later one of these say?
>
The line says:
======================================================================
[ 318.850175] busted_srcud-torture: rtc: (null) ver: 27040
tfle: 0 rta: 27040 rtaf: 0 rtf: 27027 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0
rtbf: 0 rtb: 0 \
nt: 9497 onoff: 2639/2639:2640/5310 40,373:10,355 162868:67542 (HZ=1000)
barrier: 0/0:0
=====================================================================
And here are useful errors:
=====================================================================
kern :info : [ 135.379693] KVM setup async PF for cpu 1
kern :info : [ 135.381412] kvm-stealtime: cpu 1, msr 23fd16180
kern :alert : [ 135.386897] busted_srcud-torture:torture_onoff task:
onlined 1
kern :alert : [ 135.408241] busted_srcud-torture:torture_onoff task:
offlining 1
kern :info : [ 135.423310] Unregister pv shared memory for cpu 1
kern :info : [ 135.427940] smpboot: CPU 1 is now offline
kern :alert : [ 135.430106] busted_srcud-torture:torture_onoff task:
offlined 1
kern :alert : [ 135.436404] busted_srcud-torture:torture_onoff task:
offlining 0
kern :alert : [ 135.446173] busted_srcud-torture:torture_onoff task:
offline 0 failed: errno -16
kern :alert : [ 135.453076] busted_srcud-torture:torture_onoff task:
offlining 0
kern :alert : [ 135.457461] busted_srcud-torture:torture_onoff task:
offline 0 failed: errno -16
=====================================================================
There are only two CPUs on the VM. Torture try to offline the last one
but -EBUSY occured.
I spent time to understand kernel/torture.c.
There is torture_onoff():
225 while (!torture_must_stop()) {
226 cpu = (torture_random(&rand) >> 4) % (maxcpu + 1);
227 if (!torture_offline(cpu,
228 &n_offline_attempts,
&n_offline_successes,
229 &sum_offline, &min_offline,
&max_offline))
230 torture_online(cpu,
231 &n_online_attempts,
&n_online_successes,
232 &sum_online, &min_online,
&max_online);
233 schedule_timeout_interruptible(onoff_interval);
234 }
235
torture_offline() and torture_offline() don't pre judge if the current
cpu is only one usable.
Our test machines are configured with CONFIG_BOOTPARAM_HOTPLUG_CPU0. If
there are only one oneline and hotplugable cpux, then
n_offline_successes != n_offline_attempts which caused "End of test:
RCU_HOTPLUG".
Does I misunderstand something above? Feel free to correct me.
Thanks,
Su
> Thanx, Paul
>
>
>
Powered by blists - more mailing lists