[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM0PR03MB4804FA468B7A006AEEA8592ABB8E0@AM0PR03MB4804.eurprd03.prod.outlook.com>
Date: Fri, 4 Jan 2019 12:42:27 +0000
From: Tom Putzeys <tom.putzeys@...atlascopco.com>
To: "mingo@...hat.com" <mingo@...hat.com>,
"peterz@...radead.org" <peterz@...radead.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: CFS scheduler: spin_lock usage causes dead lock when
smp_apic_timer_interrupt occurs
Dear Ingo and Peter,
I would like to report a possible bug in the CFS scheduler causing a dead lock.
We suspect this bug to have caused intermittent yet highly-persistent system freezes on our quad-core SMP systems.
We noticed the problem on 4.1.17 preempt-rt but we suspect the problematic code is not linked to the preempt-rt patch and is also present in the latest 4.20 kernel.
The problem concerns the use of spin_lock to lock cfs_b in a situation where the spin lock is used in an interrupt handler:
- __run_hrtimer (in kernel/time/hrtimer.c) calls fn(timer) with IRQ's enabled. This can call sched_cfs_period_timer() (in kernel/sched/fair.c) which locks cfs_b.
- the hard IRQ smp_apic_timer_interrupt can then occur. It can call ttwu_queue() which grabs the spin lock for its CPU run queue and can then try to enqueue a task via the CFS scheduler.
- this can call check_enqueue_throttle() which can call assign_cfs_rq_runtime() which tries to obtain the cfs_b lock. It is now blocked.
The cfs_b lock uses spin_lock and so was not intended for use inside a hard irq but the CFS scheduler does just that when it uses a hrtimer_interrupt to wake up and enqueue work. Our initial impression is that the cfs_b needs to be locked using spin_lock_irqsave.
My colleague Mike Pearce has submitted a bug report on Bugzilla 3 weeks ago: https://bugzilla.kernel.org/show_bug.cgi?id=201993
We would appreciate any feedback.
Kind regards,
Tom
Powered by blists - more mailing lists