linux-ext4 - Re: [RFC][PATCH] JBD: release checkpoint journal heads through try_to_release

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20081027142657.2120aa3f.akpm@linux-foundation.org>
Date:	Mon, 27 Oct 2008 14:26:57 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Toshiyuki Okajima <toshi.okajima@...fujitsu.com>
Cc:	linux-ext4@...r.kernel.org, sct@...hat.com,
	linux-fsdevel@...r.kernel.org
Subject: Re: [RFC][PATCH] JBD: release checkpoint journal heads through
 try_to_release_page when the memory is exhausted

(added linux-fsdevel)

On Thu, 23 Oct 2008 17:41:01 +0900
Toshiyuki Okajima <toshi.okajima@...fujitsu.com> wrote:

> Hi Andrew.
> 
> > > rather costly.  An alternative might be to implement a shrinker
> > > callback function for the journal_head slab cache.  Did you consider
> > > this?
> > Yes.
> > But the unused-list and counters are required by managing the shrink targets("journal head") 
> > if we implement a shrinker. 
> > I thought that comparatively big code changes were necessary for jbd to accomplish it. 
> 
> > However I will try it. 
> 
> I managed to build a shrinker callback function for the journal_head slab cache.
> This code size is less than before but the logic of it seems to be more complex
>  than before.
> However, I haven't got any troubles while I am testing some easy load operations
> on the fixed kernel.
> But I think a system may hang up if concurrently several journal_head shrinker 
> are executed.
> So, I will retry to build more appropriate fix.

yeah, that's not very pretty either, is it?

> Please give me comments if you have a nicer idea.

Stepping back a bit...

The basic problem is, I believe, that some client of the blockdev
(ext3) is adding metadata to the blockdev's data structures
(buffer_heads) but we have no means by which the blockdev code can call
back into that client requesting that the metadata be released, yes?

We can fix the problem which you've identified by adding a means for
the blockdev code (def_blk_aops.releasepage()) to call back into ext3,
yes?

If so, how do we do that?

I seem to recall that there's code somewhere in the tree which does
things like taking a copy of bdev->address_space_operations and
reinstalling that, and overwriting selected fields, and then arranging
somehow for the old value to be reinstalled when the client releases
the blockdev.  That's plain nasty.

Perhaps what we could do is to add a new

	blkdev_register_releasepage(struct block-device *,
					int (*)(struct page *, gfp_t)

function and call that from within ext3 initialisation.  (This could be
a block_device_operations entry, but is there any point in doing that?)

Within blkdev_register_releasepage(), record the address of that
function in the `struct block_device' (with what locking??) and then
implement def_blk_aops.releasepage(), which calls
bdev->registered_releasepage().  Set def_blk_aops.releaspage() to point
at try_to_free_buffers() to provide the default behaviour.

Then we'd need a blkdev_unregister_releasepage() which restores the old
value.  Or, better, make blkdev_register_releasepage()
return the old value and require that clients of the blockdev (ie:
ext3) restore the old value prior to releasing the blockdev.

Or something along these lines, anyway..

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html