lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110304125220.GA6740@infradead.org>
Date:	Fri, 4 Mar 2011 07:52:20 -0500
From:	Christoph Hellwig <hch@...radead.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Anton Altaparmakov <aia21@....ac.uk>, Jens Axboe <axboe@...nel.dk>,
	Christoph Hellwig <hch@....de>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	George Spelvin <linux@...izon.com>
Subject: Re: a major regression in recent kernels? - was: Re: Null pointer
 OOPS in sync_inodes_sb+0xa9/0x104

On Wed, Mar 02, 2011 at 10:31:15AM -0800, Linus Torvalds wrote:
> The whole "backing_dev_info" has been a total disaster. The thing is
> crap. It violates all the normal kernel memory management rules ("Thou
> shalt use reference counts and free only when it goes to zero") and
> the whole thing has been a constant source of "oh, that driver didn't
> set it, but we changed all the code to require it to be correct".
> 
> And the reason we set it to NULL when the device goes away is exactly
> that it's not ref-counted correctly, so we really _have_ to set it to
> NULL, because it's not going to be around.
> 
> (And the reverse of that is why all kernel data structures should use
> refcounts, and not some external lifetime notion)

Yes.  But the bdi is even worse than that, as it conflates things with
different lifetime into a single object.  We have the "old school" bdi
which mostly contained various bits of tuning for the VM and read-ahead
algorithms.  This one is required to stay around even with no fs mounted
on block devices because people expect it to stay around with no fs
mounted.  And then we have the writeback context entangled into it,
which only makes sense with an active filesystem (or block device node)
on it to make it special fun.  Even more fun is that we have a pointer
from the superblock, and one from the inode, and the latter might point
to lala land if this is say a /dev/mem node which has a different bdi
for the "old-school" MM usage.

I had various stages of prototypes for separating the two into:

 1) the old bdi.  Life time rules are: allocated and reference counted
    with the containing device.  That is gendisk for block devices,
    server context for remote devices, static at module init time for
    /dev/zero and similar.
 2) writeback context.  Only exists if a user is there, and thus
    refcounted by itself. For non-blockdevice filesystem instances it's
    trivially always allocated with the superblock, and goes away with it.
    For block-device instances we need to keep a pointer to it from
    struct block_device and properly look it up on mount, or opening of
    the block device nodes.

I guess I need to get back to it, but kept it off for now as the code
had reached relative stability and really fear touching it again.

It's for sure not .38 material, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ