lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190918144138.24839-1-balasubramani_vivekanandan@mentor.com>
Date:   Wed, 18 Sep 2019 16:41:37 +0200
From:   Balasubramani Vivekanandan <balasubramani_vivekanandan@...tor.com>
To:     <fweisbec@...il.com>, <tglx@...utronix.de>, <mingo@...nel.org>
CC:     <balasubramani_vivekanandan@...tor.com>, <erosca@...adit-jv.com>,
        <linux-kernel@...r.kernel.org>
Subject: [PATCH V1 0/1] tick: broadcast-hrtimer: Fix a race in bc_set_next

I was investigating a rcu stall warning on ARM64 Renesas Rcar3
platform. On analysis I found that rcu stall warning was because the
rcu_preempt kthread was starved of cpu time. rcu_preempt was blocked in
the function schedule_timeout() and never woken up. On further
investigation I found that local timer interrupts were not happening on
the cpu where the rcu_preempt kthread was blocked. So the rcu_preempt
was not woken up after timeout.
I continued my analysis to debug why the timer failed on the cpu. I
found that when cpu goes through idle state cycle, the timer failure
happens. When the cpu enters the idle state it subscribes to the tick
broadcast clock and shutsdown the local timer. Then on exit from idle
state the local timer is programmed to fire interrupts. But I found that
the during the error scenario, cpu fails to program the local timer on
exit from idle state. The below code in
__tick_broadcast_oneshot_control() is where the idle code exit path goes
through and fails to program the timer hardware

now = ktime_get();
if (dev->next_event <= now) {
	cpumask_set_cpu(cpu, tick_broadcast_force_mask);
		goto out;
}

The value in next_event will be earlier than current time because the
tick broadcast clock did not wake up the cpu on its subcribed
timeout. Later when the cpu is woken up due to some other event this
condition will arise. After the cpu woken up, any further timeout
requests by any task on the cpu might fail to program the timer
hardware because the value in next_event will be earlier than the
current time.
Then I focussed on why the tick broadcast clock failed to wake up the
cpu. I noticed a race condition in the hrtimer based tick broadcast
clock. The race condition results in a condition where the tick
broadcast hrtimer is never restarted. I have created a patch to fix the
race condition. Please review 

Balasubramani Vivekanandan (1):
  tick: broadcast-hrtimer: Fix a race in bc_set_next

 kernel/time/tick-broadcast-hrtimer.c | 58 ++++++++++++++++++++++------
 kernel/time/tick-broadcast.c         |  2 +
 2 files changed, 48 insertions(+), 12 deletions(-)

-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ