linux-kernel - Query regarding synchronize_sched_expedited and resched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 15 Sep 2017 16:44:38 +0530
From:   Neeraj Upadhyay <neeraju@...eaurora.org>
To:     paulmck@...ux.vnet.ibm.com, josh@...htriplett.org,
        rostedt@...dmis.org, mathieu.desnoyers@...icios.com,
        jiangshanlai@...il.com
Cc:     linux-kernel@...r.kernel.org, sramana@...eaurora.org,
        prsood@...eaurora.org
Subject: Query regarding synchronize_sched_expedited and resched_cpu

Hi,

We have one query regarding the behavior of RCU expedited grace period,
for scenario where resched_cpu() in sync_sched_exp_handler() fails to
acquire the rq lock and returns w/o setting the need_resched. In this
case, how do we ensure that the CPU notify rcu about the
end of sched grace period (schedule() -> __schedule() ->
rcu_note_context_switch(cpu) -> rcu_sched_qs()) , for cases where tick
is stopped on that CPU.  Is it implied from the rq lock acquisition
failure, that the owner of the rq lock will enforce context switch?
For which scenarios in RCU paths (as the function is used only in RCU
code), we need trylock check in resched_cpu()?

void resched_cpu(int cpu)
{
         struct rq *rq = cpu_rq(cpu);
         unsigned long flags;

         if (!raw_spin_trylock_irqsave(&rq->lock, flags))
                 return;
         resched_curr(rq);
         raw_spin_unlock_irqrestore(&rq->lock, flags);
}

This issue was observed in below scenario, where one of the CPUs (CPU1)
started synchronize_sched_expedited and sent IPI to CPU5, which is in
the idle path but handled sync_sched_exp_handler() IPI before 
rcu_idle_enter().
As resched_cpu() failed to acquire the rq lock, need_resched was not set,
and CPU went to idle; resulting in expedited stall getting reported by 
CPU1.

Below is the scenario:

•    CPU1 is waiting for expedited wait to complete:
sync_rcu_exp_select_cpus
     rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
     IPI sent to CPU5

synchronize_sched_expedited_wait
         ret = swait_event_timeout(
                                     rsp->expedited_wq,
  sync_rcu_preempt_exp_done(rnp_root),
                                     jiffies_stall);

            expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())

•    CPU5 handles IPI and fails to acquire rq lock.

Handles IPI
     sync_sched_exp_handler
         resched_cpu
             returns while failing to try lock acquire rq->lock
         need_resched is not set

•    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
     idle (schedule() is not called).

•    CPU 1 reports RCU stall.

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation