[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTimDOz7M6m6Xo==DbLHP6q2pMBzd9g@mail.gmail.com>
Date: Thu, 14 Apr 2011 14:24:42 +0200
From: Thilo-Alexander Ginkel <thilo@...kel.com>
To: Arnd Bergmann <arnd@...db.de>, Tejun Heo <tj@...nel.org>,
"Rafael J. Wysocki" <rjw@...k.pl>
Cc: linux-kernel@...r.kernel.org
Subject: Re: Soft lockup during suspend since ~2.6.36 [bisected]
On Wed, Apr 6, 2011 at 08:03, Thilo-Alexander Ginkel <thilo@...kel.com> wrote:
> On Wed, Apr 6, 2011 at 01:28, Arnd Bergmann <arnd@...db.de> wrote:
>> On Tuesday 05 April 2011, Thilo-Alexander Ginkel wrote:
>>> Thanks, that worked pretty well. A bisect with eleven builds later I
>>> have now identified the following candidate commit, which may have
>>> introduced the bug:
>>>
>>> dcd989cb73ab0f7b722d64ab6516f101d9f43f88 is the first bad commit
>>> commit dcd989cb73ab0f7b722d64ab6516f101d9f43f88
>>> Author: Tejun Heo <tj@...nel.org>
>>> Date: Tue Jun 29 10:07:14 2010 +0200
>>
>> Sorry, but looking at the patch shows that it can't possibly have introduced
>> the problem, since all the code that is modified in it is new code that
>> is not even used anywhere at that stage.
>>
>> As far as I can tell, you must have hit a false positive or a false negative
>> somewhere in the bisect.
>
> Well you're right. I hit "Reply" too early and should have paid closer
> attention to what change the bisect actually brought up.
>
> I already found a false negative (fortunately pretty close to the end
> of the bisect sequence) and also verified the preceding good commits,
> which gives me two new commits to test. I'll provide an update once
> the builds and tests are through, which may however take until early
> next week as I will be on vacation until then.
All right... I verified all my bisect tests and actually found yet
another bug. After correcting that one (and verifying the correctness
of the other tests), git bisect actually came up with a commit, which
makes some more sense:
| e22bee782b3b00bd4534ae9b1c5fb2e8e6573c5c is the first bad commit
| commit e22bee782b3b00bd4534ae9b1c5fb2e8e6573c5c
| Author: Tejun Heo <tj@...nel.org>
| Date: Tue Jun 29 10:07:14 2010 +0200
|
| workqueue: implement concurrency managed dynamic worker pool
The good news is that I am able to reproduce the issue within a KVM
virtual machine, so I am able to test for the soft lockup (which
somewhat looks like a race condition during worker / CPU shutdown) in
a mostly automated fashion. Unfortunately, that also means that this
issue is all but hardware specific, i.e., it most probably affects all
SMP systems (with a varying probability depending on the number of
CPUs).
Adding some further details about my configuration (which I replicated
in the VM):
- lvm running on top of
- dmcrypt (luks) running on top of
- md raid1
If anyone is interested in getting hold of this VM for further tests,
let me know and I'll try to figure out how to get it (2*8 GB, barely
compressible due to dmcrypt) to its recipient.
Regards,
Thilo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists