[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52AFF20F.5070202@oracle.com>
Date: Tue, 17 Dec 2013 01:41:19 -0500
From: Sasha Levin <sasha.levin@...cle.com>
To: John Stultz <john.stultz@...aro.org>,
LKML <linux-kernel@...r.kernel.org>
CC: Thomas Gleixner <tglx@...utronix.de>,
Prarit Bhargava <prarit@...hat.com>,
Richard Cochran <richardcochran@...il.com>,
Ingo Molnar <mingo@...nel.org>, stable <stable@...r.kernel.org>
Subject: Re: [RFC][PATCH 3/5] timekeeping: Avoid possible deadlock from clock_was_set_delayed
On 12/17/2013 12:15 AM, John Stultz wrote:
> On 12/12/2013 11:13 AM, John Stultz wrote:
>> On 12/12/2013 11:05 AM, Sasha Levin wrote:
>>> On 12/12/2013 01:59 PM, John Stultz wrote:
>>>> On 12/12/2013 10:32 AM, Sasha Levin wrote:
>>>>> On 12/12/2013 11:34 AM, Sasha Levin wrote:
>>>>>> On 12/11/2013 02:11 PM, John Stultz wrote:
>>>>>>> As part of normal operaions, the hrtimer subsystem frequently calls
>>>>>>> into the timekeeping code, creating a locking order of
>>>>>>> hrtimer locks -> timekeeping locks
>>>>>>>
>>>>>>> clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
>>>>>>> between the timekeeping the hrtimer subsystem, so that we could
>>>>>>> notify the hrtimer subsytem the time had changed while holding
>>>>>>> the timekeeping locks. This was done by scheduling delayed work
>>>>>>> that would run later once we were out of the timekeeing code.
>>>>>>>
>>>>>>> But unfortunately the lock chains are complex enoguh that in
>>>>>>> scheduling delayed work, we end up eventually trying to grab
>>>>>>> an hrtimer lock.
>>>>>>>
>>>>>>> Sasha Levin noticed this in testing when the new seqlock lockdep
>>>>>>> enablement triggered the following (somewhat abrieviated) message:
>>>>>> [snip]
>>>>>>
>>>>>> This seems to work for me, I don't see the lockdep spew anymore.
>>>>>>
>>>>>> Tested-by: Sasha Levin <sasha.levin@...cle.com>
>>>>> I think I spoke too soon.
>>>>>
>>>>> It took way more time to reproduce than previously, but I got:
>>>>>
>>>>>
>>>>> -> #1 (&(&pool->lock)->rlock){-.-...}:
>>>>> [ 1195.578519] [<ffffffff81194803>] validate_chain+0x6c3/0x7b0
>>>>> [ 1195.578519] [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
>>>>> [ 1195.578519] [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0
>>>>> [ 1195.578519] [<ffffffff843b0760>] _raw_spin_lock+0x40/0x80
>>>>> [ 1195.578519] [<ffffffff81153e0e>] __queue_work+0x14e/0x3f0
>>>>> [ 1195.578519] [<ffffffff81154168>] queue_work_on+0x98/0x120
>>>>> [ 1195.578519] [<ffffffff81161351>]
>>>>> clock_was_set_delayed+0x21/0x30
>>>>> [ 1195.578519] [<ffffffff811c4b41>] do_adjtimex+0x111/0x160
>>>>> [ 1195.578519] [<ffffffff811360e3>] SYSC_adjtimex+0x43/0x80
>>>>> [ 1195.578519] [<ffffffff8113612e>] SyS_adjtimex+0xe/0x10
>>>>> [ 1195.578519] [<ffffffff843baed0>] tracesys+0xdd/0xe2
>>>>> [ 1195.578519]
>>>> Are you sure you have that patch applied?
>>>>
>>>> With it we shouldn't be calling clock_was_set_delayed() from
>>>> do_adjtimex().
>>> Hm, It seems that there's a conflict there that wasn't resolved
>>> properly. Does this patch
>>> depend on anything else that's not currently in -next?
>> Oh yes, sorry, I didn't cc you on the entire patch set. Apologies!
>>
>> You'll probably want to grab the two previous patches:
>> https://lkml.org/lkml/2013/12/11/479
>> https://lkml.org/lkml/2013/12/11/758
>
> Just wanted to follow up here. Did you happen to get a chance to try to
> reproduce w/ the three patch patchset?
>
> I'm hoping to submit them to Ingo tomorrow, and want to make sure I've
> got your tested-by.
Oh yeah, have been running it ever since, haven't seen the issue reproduce.
Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists