lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20080319121721.GH155407@sgi.com>
Date:	Wed, 19 Mar 2008 23:17:21 +1100
From:	David Chinner <dgc@....com>
To:	Jan Kara <jack@...e.cz>
Cc:	David Chinner <dgc@....com>, lkml <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: BUG: drop_pagecache_sb vs kjournald lockup

On Wed, Mar 19, 2008 at 11:08:05AM +0100, Jan Kara wrote:
> On Wed 19-03-08 07:46:46, David Chinner wrote:
> > On Tue, Mar 18, 2008 at 02:43:26PM +0100, Jan Kara wrote:
> > > > 2.6.25-rc3, 4p ia64, ext3 root drive.
> > > > 
> > > > I was running an XFS stress test on one of the XFS partitions on
> > > > the machine (zero load on the root ext3 drive), when the system
> > > > locked up in kjournald with this on the console:
> > > > 
> > > > BUG: spinlock lockup on CPU#2, kjournald/2150, a000000100e022e0
> > > > 
> > >   <snip traces>
> > 
> > <snip other stuff>
> > 
> > > > Anyone know the reason why drop_pagecache_sb() uses such a brute-force
> > > > mechanism to free up clean page cache pages?
> > >   Yes, we know that drop_pagecache_sb() has locking issues but since it
> > > is intended to be used for debugging purposes only, nobody cared enough
> > > to fix it. Completely untested patch below if you dare to try ;)
> > 
> > It may be intended for debuging purposes, but it does get used in
> > production HPC environments (a lot!). I guess I've never seen this
> > lockup before because SGI customers don't use ext3, but they have
> > complained about the system "stopping" while drop_caches is executed.
> > This locking ..... strategy would explain it, though.
>   ;) But in case your customers use it in production, shouldn't you push
> for a better interface for such feature? Just a thought...

Because it's much less horrible than the thing it replaced, does
what is needed and mostly does not cause problems? And because it
usually just works I've never felt the need to look at the
implementation....

/me is off to look at Fengguang's patch

> > I'll try the patch, but I can't guarantee anything - I only saw this
> > lockup once in about 18 hours when dropping caches every 2 seconds.
>   Well, if you enable lockdep, it should warn you about possible problems
> much earlier (at least we already got several reports of lockdep warnings
> when using drop_caches).

ia64 - no lockdep :/

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ