lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20071220150453.GB764@atrey.karlin.mff.cuni.cz>
Date:	Thu, 20 Dec 2007 16:04:53 +0100
From:	Jan Kara <jack@...e.cz>
To:	Bj?rn Steinbrink <B.Steinbrink@....de>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Krzysztof Oledzki <olel@....pl>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Osterried <osterried@...se.de>, protasnb@...il.com,
	bugme-daemon@...zilla.kernel.org
Subject: Re: [Bug 9182] Critical memory leak (dirty pages)

> On 2007.12.19 09:44:50 -0800, Linus Torvalds wrote:
> > 
> > 
> > On Sun, 16 Dec 2007, Krzysztof Oledzki wrote:
> > > 
> > > I'll confirm this tomorrow but it seems that even switching to data=ordered
> > > (AFAIK default o ext3) is indeed enough to cure this problem.
> > 
> > Ok, do we actually have any ext3 expert following this? I have no idea 
> > about what the journalling code does, but I have painful memories of ext3 
> > doing really odd buffer-head-based IO and totally bypassing all the normal 
> > page dirty logic.
> > 
> > Judging by the symptoms (sorry for not following this well, it came up 
> > while I was mostly away travelling), something probably *does* clear the 
> > dirty bit on the pages, but the dirty *accounting* is not done properly, 
> > so the kernel keeps thinking it has dirty pages.
> > 
> > Now, a simple "grep" shows that ext3 does not actually do any 
> > ClearPageDirty() or similar on its own, although maybe I missed some other 
> > subtle way this can happen. And the *normal* VFS routines that do 
> > ClearPageDirty should all be doing the proper accounting.
> > 
> > So I see a couple of possible cases:
> > 
> >  - actually clearing the PG_dirty bit somehow, without doing the 
> >    accounting.
> > 
> >    This looks very unlikely. PG_dirty is always cleared by some variant of 
> >    "*ClearPageDirty()", and that bit definition isn't used for anything 
> >    else in the whole kernel judging by "grep" (the page allocator tests 
> >    the bit, that's it).
> 
> OK, so I looked for PG_dirty anyway.
> 
> In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers
> bail out if the page is dirty.
> 
> Then in 3e67c0987d7567ad666641164a153dca9a43b11d, Andrew fixed
> truncate_complete_page, because it called cancel_dirty_page (and thus
> cleared PG_dirty) after try_to_free_buffers was called via
> do_invalidatepage.
> 
> Now, if I'm not mistaken, we can end up as follows.
> 
> truncate_complete_page()
>   cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
>   do_invalidatepage()
>     ext3_invalidatepage()
>       journal_invalidatepage()
>         journal_unmap_buffer()
>           __dispose_buffer()
>             __journal_unfile_buffer()
>               __journal_temp_unlink_buffer()
>                 mark_buffer_dirty(); // PG_dirty set, incr. dirty pages
> 
> If journal_unmap_buffer then returns 0, try_to_free_buffers is not
> called and neither is cancel_dirty_page, so the dirty pages accounting
> is not decreased again.
  Yes, this can happen. The call to mark_buffer_dirty() is a fallout
from journal_unfile_buffer() trying to sychronise JBD private dirty bit
(jbddirty) with the standard dirty bit. We could actually clear the
jbddirty bit before calling journal_unfile_buffer() so that this doesn't
happen but since Linus changed remove_from_pagecache() to not care about
redirtying the page I guess it's not needed any more...

								Honza
-- 
Jan Kara <jack@...e.cz>
SuSE CR Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ