lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201102221343.p1MDhPri028681@demeter2.kernel.org>
Date:	Tue, 22 Feb 2011 13:43:25 GMT
From:	bugzilla-daemon@...zilla.kernel.org
To:	linux-ext4@...r.kernel.org
Subject: [Bug 29402] kernel panics while running ffsb scalability workloads
 on 2.6.38-rc1 through -rc5

https://bugzilla.kernel.org/show_bug.cgi?id=29402


Lukas Czerner <lczerner@...hat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lczerner@...hat.com




--- Comment #6 from Lukas Czerner <lczerner@...hat.com>  2011-02-22 13:43:24 ---
Hi Eric,

it might be just a shot in the dark, but would you be willing to try out this
patch ? Unfortunately I do not currently have a machine to reproduce the
problem.

I *think* that the problem is in the block layer (bio_batch_end_io() and
blkdev_issue_zeroout() to be specific) and since lazy init is using that to do
the zeroing we can not hit it with lazy init turned off.

Now, the problem I see is that when we are going to wait_for_completion() in
blkdev_issue_discard() we check if the bb.done equals issued (number of issued
bios). If it equals, we can skip the wait_for_completion() and jump out of the
function since there is nothing to wait for. However, there is a ordering
problem because bio_batch_end_io() is calling atomic_inc(&bb->done) before
complete(), hence it might seem to blkdev_issue_zeroout() that all bios has
been completed and exit. At this point when bio_batch_end_io() is going to call
complete(bb->wait) bb does not longer exist since it was declared locally in
blkdev_issue_zeroout() ==> panic while trying to acquire wait.lock!

(thread 1)                         (thread 2)
bio_batch_end_io()                 blkdev_issue_zeroout()
  if(bb) {                         ...
    if (bb->end_io)                ...
      bb->end_io(bio, err);        ...
    atomic_inc(&bb->done);         ...
    ...                            while (issued != atomic_read(&bb.done))
    ...                            (let issued == bb.done)
    ...                            (do the rest of the function)
    ...                            return ret;
    complete(bb->wait);
    ^^^^^^^^
Panic in complete() while trying to acquire spinlock.

I hope it is not complete nonsense :).Please let me know whether the patch
helps.

Thanks!
-Lukas

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ