linux-kernel - Re: [PATCH] rcu: Start grace period even for single callback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ffe79bedc6f4d99f0587604ca27e1df3.squirrel@www.codeaurora.org>
Date:	Fri, 12 Jun 2015 12:13:20 -0000
From:	kishank@...eaurora.org
To:	paulmck@...ux.vnet.ibm.com
Cc:	"Kishan Kumar" <kishank@...eaurora.org>, josh@...htriplett.org,
	rostedt@...dmis.org, mathieu.desnoyers@...icios.com,
	laijs@...fujitsu.com, linux-kernel@...r.kernel.org,
	linux-arm-msm@...r.kernel.org,
	"Mohammed Khajapasha" <mkhaja@...eaurora.org>,
	"Vignesh Radhakrishnan" <vigneshr@...eaurora.org>
Subject: Re: [PATCH] rcu: Start grace period even for single callback

Hi Paul,

Thanks for your quick response.
Please find the reply below.

> On Thu, Jun 11, 2015 at 07:18:18PM +0530, Kishan Kumar wrote:
>> When we queue a callback for RCU, it goes and sits
>> on the nxttail of the per-cpu RCU data structure.
>> Callback is represented via struct callback_head
>> structure.
>>
>> struct callback_head {
>>         struct callback_head *next;
>>         void (*func)(struct callback_head *head);
>> };
>>
>> In case of a single callback queued in the nxttail,
>> the next field will be NULL. "next" happens to be
>> the zeroeth element of struct callback_head.
>>
>> The condition "if(*rdp->nxttail[RCU_NEXT_READY_TAIL])"
>> in the function cpu_needs_another_gp(), essentially
>> checks if any callback is queued.
>>
>> Since *rdp->nxttail[RCU_NEXT_READY_TAIL] dereferences
>> to the first element, the if condition will just turn
>> out to be if(NULL) in case there is a single callback
>> queued. This in turn causes cpu_needs_another_gp() to
>> return false even though we need a grace period to
>> process the single callback. This leads to writers
>> waiting until a second call_back gets queued, which can
>> cause undesirable effects like boot up delay upto 300
>> seconds, etc.
>>
>> Fix this by performing this check on the "func" field
>> rather than the "next" field of the callback_head.
>>
>> Signed-off-by: Kishan Kumar <kishank@...eaurora.org>
>> Signed-off-by: Mohammed Khajapasha <mkhaja@...eaurora.org>
>> Signed-off-by: Vignesh Radhakrishnan <vigneshr@...eaurora.org>
>
> Hmmm...
>
> Exactly what did you do to test the problem and verify your fix?

We were seeing boot up getting stuck for upto 300 seconds during SELINUX
initialization as shown below. We could see this issue once in 10 to 15
reboots.

Going through the trace prints & dumps, we were able to narrow it down to
cpu_needs_another_gp() returning false.  With this patch, we ran overnight
reboot test and did not observe any boot up delay.

Below is the dmesg snippet for the delay:

o8<----------------------------
[    5.599920] SELinux: initialized (dev tmpfs, type tmpfs), uses
transition SIDs
[    5.599962] SELinux: initialized (dev sysfs, type sysfs), uses
genfs_contexts
[    5.716237] type=1403 audit(17376.649:2): policy loaded auid=4294967295
ses=4294967295
[    5.716472] SELinux: Loaded policy from /sepolicy
[    5.719269] type=1404 audit(17376.649:3): enforcing=1 old_enforcing=0
auid=4294967295 ses=4294967295
[   65.335252] BMS: voltage_soc_timeout_work: BMS device not opened, using
voltage based SOC
[  302.785303] kworker/0:0 (4) used greatest stack depth: 6408 bytes left
[  302.845390] kworker/u8:1 (32) used greatest stack depth: 5944 bytes left
[  307.322100] init: /init.qcom.rc: 497: user option requires a user id
[  307.325946] init (1): /proc/1/oom_adj is deprecated, please use
/proc/1/oom_score_adj instead.
------------------------------->8


Below is the snippet for the trace prints difference between success &
failure cases.

Success Case:
------------
init-1     [000] ....     7.013533: selinux_netcache_avc_callback: venk:
selinux_netcache_avc_callback start
<idle>-0   [000] d.h1     7.038879: cpu_needs_another_gp: venk:
cpu_needs_another_gp Yes, this CPU has newly registered callbacks.
<idle>-0   [000] d.h1     7.038887: rcu_check_callbacks: venk:
__rcu_pending RCU gone idle with this CPU 0 needing another grace period.
<idle>-0   [000] d.h1     7.038892: invoke_rcu_core: KK : invoke_rcu_core
<idle>-0   [000] .Ns1     7.038935: rcu_process_callbacks: venk:
rcu_process_callbacks before calling rcu_process_callbacks for individual
cpus
<idle>-0   [000] dNs1     7.038941: cpu_needs_another_gp: venk:
cpu_needs_another_gp Yes, this CPU has newly registered callbacks.
<idle>-0   [000] dNs2     7.038947: rcu_process_callbacks: venk:
__rcu_process_callbacks before calling rcu_start_gp()

Failure Case:
-------------

init-1     [003] ....     7.003500: selinux_netcache_avc_callback: venk:
selinux_netcache_avc_callback start
<idle>-0   [002] d.h1     8.488947: invoke_rcu_core: KK : invoke_rcu_core
<idle>-0   [002] .Ns1     8.488993: rcu_process_callbacks: venk:
rcu_process_callbacks before calling rcu_process_callbacks for individual
cpus
<idle>-0   [002] dNs1     8.489000: rcu_process_callbacks: venk:
__rcu_process_callbacks cpu_needs_another_gp() returned false
<idle>-0   [002] dNs1     8.489005: rcu_process_callbacks: venk:
__rcu_process_callbacks cpu_needs_another_gp() returned false
<idle>-0   [002] dNs1     8.489010: rcu_process_callbacks: venk:
__rcu_process_callbacks cpu_needs_another_gp() returned false
<idle>-0   [001] d.h1   305.268951: cpu_needs_another_gp: venk:
cpu_needs_another_gp Yes, this CPU has newly registered callbacks.
<idle>-0   [001] d.h1   305.268961: rcu_check_callbacks: venk:
__rcu_pending RCU gone idle with this CPU 1 needing another grace period.
<idle>-0   [001] d.h1   305.268965: invoke_rcu_core: KK : invoke_rcu_core
<idle>-0   [001] .Ns1   305.269004: rcu_process_callbacks: venk:
rcu_process_callbacks before calling rcu_process_callbacks for individual
cpus
<idle>-0   [001] dNs1   305.269010: cpu_needs_another_gp: venk:
cpu_needs_another_gp Yes, this CPU has newly registered callbacks.
<idle>-0   [001] dNs2   305.269016: rcu_process_callbacks: venk:
__rcu_process_callbacks before calling rcu_start_gp()


>
> From what I can see, if there is exactly one newly queued callback, we
> will
> have the following:
>
> o	rdp->nxttail[RCU_DONE_TAIL] == &rdp->nxtlist
> o	rdp->nxttail[RCU_WAIT_TAIL] == &rdp->nxtlist
> o	rdp->nxttail[RCU_NEXT_READY_TAIL] == &rdp->nxtlist
> o	rdp->nxttail[RCU_NEXT_TAIL] == &rdp->nxtlist->next
>
> And in this case, rdp->nxtlist will reference the first callback.
> So *rdp->nxttail[RCU_NEXT_READY_TAIL] will be non-NULL in this case,
> as required.
>
> So what am I missing here?

>From the rcu data collected, we could see that we have a callback queued
(wakeme_after_rcu) and "next" field of struct callback_head 0xdec59c24 is
indeed NULL.  Please let me know if there is something wrong in my
understanding.

	crash> callback_head 0xdec59c24
	struct callback_head {
	  next = 0x0,
	  func = 0xc0137b1c <wakeme_after_rcu>
	}

This is how I got the above structure :

	crash> rcu_preempt_data
	PER-CPU DATA TYPE:
	struct rcu_data rcu_preempt_data;
	PER-CPU ADDRESSES:
	[0]: c2cd7678
	[1]: c2ce5678
	[2]: c2cf3678
	[3]: c2d01678


	crash> rcu_data -o c2cd7678
	[c2cd768c] struct callback_head *nxtlist;

	crash> rcu_data c2cd7678
	nxtlist = 0xdec59c24,
	  nxttail = {0xc2cd768c, 0xdec59c24, 0xdec59c24, 0xdec59c24},

	crash> callback_head 0xdec59c24
	struct callback_head {
	  next = 0x0,
	  func = 0xc0137b1c <wakeme_after_rcu>
	}

> Also, what version of the kernel are you using?  Are you posting callbacks
> during early boot?  (Such callbacks won't be invoked until after the
> scheduler starts running.)  What kernel configuration are you using?

We are using 3.10 kernel version. These SELINUX callbacks are getting
registered during SELINUX initialization. This callback queueing happens a
lot
later after scheduler has been invoked.  I have attached the dmesg, config
and
trace prints for success & failure cases for your reference.

Thanks
Kishan.
View attachment "config.txt" of type "text/plain" (105896 bytes)

View attachment "dmesg.txt" of type "text/plain" (93624 bytes)

View attachment "trace_failure_case.txt" of type "text/plain" (7003 bytes)

View attachment "trace_success_case.txt" of type "text/plain" (6603 bytes)