[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20071220141217.GA4745@atjola.homenet>
Date: Thu, 20 Dec 2007 15:12:17 +0100
From: Björn Steinbrink <B.Steinbrink@....de>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Krzysztof Oledzki <olel@....pl>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Nick Piggin <nickpiggin@...oo.com.au>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Osterried <osterried@...se.de>, protasnb@...il.com,
bugme-daemon@...zilla.kernel.org
Subject: Re: [Bug 9182] Critical memory leak (dirty pages)
On 2007.12.19 09:44:50 -0800, Linus Torvalds wrote:
>
>
> On Sun, 16 Dec 2007, Krzysztof Oledzki wrote:
> >
> > I'll confirm this tomorrow but it seems that even switching to data=ordered
> > (AFAIK default o ext3) is indeed enough to cure this problem.
>
> Ok, do we actually have any ext3 expert following this? I have no idea
> about what the journalling code does, but I have painful memories of ext3
> doing really odd buffer-head-based IO and totally bypassing all the normal
> page dirty logic.
>
> Judging by the symptoms (sorry for not following this well, it came up
> while I was mostly away travelling), something probably *does* clear the
> dirty bit on the pages, but the dirty *accounting* is not done properly,
> so the kernel keeps thinking it has dirty pages.
>
> Now, a simple "grep" shows that ext3 does not actually do any
> ClearPageDirty() or similar on its own, although maybe I missed some other
> subtle way this can happen. And the *normal* VFS routines that do
> ClearPageDirty should all be doing the proper accounting.
>
> So I see a couple of possible cases:
>
> - actually clearing the PG_dirty bit somehow, without doing the
> accounting.
>
> This looks very unlikely. PG_dirty is always cleared by some variant of
> "*ClearPageDirty()", and that bit definition isn't used for anything
> else in the whole kernel judging by "grep" (the page allocator tests
> the bit, that's it).
OK, so I looked for PG_dirty anyway.
In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers
bail out if the page is dirty.
Then in 3e67c0987d7567ad666641164a153dca9a43b11d, Andrew fixed
truncate_complete_page, because it called cancel_dirty_page (and thus
cleared PG_dirty) after try_to_free_buffers was called via
do_invalidatepage.
Now, if I'm not mistaken, we can end up as follows.
truncate_complete_page()
cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
do_invalidatepage()
ext3_invalidatepage()
journal_invalidatepage()
journal_unmap_buffer()
__dispose_buffer()
__journal_unfile_buffer()
__journal_temp_unlink_buffer()
mark_buffer_dirty(); // PG_dirty set, incr. dirty pages
If journal_unmap_buffer then returns 0, try_to_free_buffers is not
called and neither is cancel_dirty_page, so the dirty pages accounting
is not decreased again.
As try_to_free_buffers got its ext3 hack back in
ecdfc9787fe527491baefc22dce8b2dbd5b2908d, maybe
3e67c0987d7567ad666641164a153dca9a43b11d should be reverted? (Except for
the accounting fix in cancel_dirty_page, of course).
On a side note, before 8368e328dfe1c534957051333a87b3210a12743b the task
io accounting for cancelled writes happened always happened if the page
was dirty, regardless of page->mapping. This was also already true for
the old test_clear_page_dirty code, and the commit log for
8368e328dfe1c534957051333a87b3210a12743b doesn't mention that semantic
change either, so maybe the "if (account_size)" block should be moved
out of the if "(mapping && ...)" block?
Björn - not sending patches because he needs sleep and wouldn't have a
damn clue about what to write as a commit message anyway
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists