linux-kernel - Re: [Bug 9182] Critical memory leak (dirty pages)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0712212049001.24477@bizon.gios.gov.pl>
Date:	Fri, 21 Dec 2007 20:59:27 +0100 (CET)
From:	Krzysztof Oledzki <olel@....pl>
To:	Björn Steinbrink <B.Steinbrink@....de>
cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Osterried <osterried@...se.de>, protasnb@...il.com,
	bugme-daemon@...zilla.kernel.org
Subject: Re: [Bug 9182] Critical memory leak (dirty pages)



On Thu, 20 Dec 2007, Björn Steinbrink wrote:

> On 2007.12.20 08:25:56 -0800, Linus Torvalds wrote:
>>
>>
>> On Thu, 20 Dec 2007, Bj?rn Steinbrink wrote:
>>>
>>> OK, so I looked for PG_dirty anyway.
>>>
>>> In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers
>>> bail out if the page is dirty.
>>>
>>> Then in 3e67c0987d7567ad666641164a153dca9a43b11d, Andrew fixed
>>> truncate_complete_page, because it called cancel_dirty_page (and thus
>>> cleared PG_dirty) after try_to_free_buffers was called via
>>> do_invalidatepage.
>>>
>>> Now, if I'm not mistaken, we can end up as follows.
>>>
>>> truncate_complete_page()
>>>   cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
>>>   do_invalidatepage()
>>>     ext3_invalidatepage()
>>>       journal_invalidatepage()
>>>         journal_unmap_buffer()
>>>           __dispose_buffer()
>>>             __journal_unfile_buffer()
>>>               __journal_temp_unlink_buffer()
>>>                 mark_buffer_dirty(); // PG_dirty set, incr. dirty pages
>>
>> Good, this seems to be the exact path that actually triggers it. I got to
>> journal_unmap_buffer(), but was too lazy to actually then bother to follow
>> it all the way down - I decided that I didn't actually really even care
>> what the low-level FS layer did, I had already convinced myself that it
>> obviously must be dirtying the page some way, since that matched the
>> symptoms exactly (ie only the journaling case was impacted, and this was
>> all about the journal).
>>
>> But perhaps more importantly: regardless of what the low-level filesystem
>> did at that point, the VM accounting shouldn't care, and should be robust
>> in the face of a low-level filesystem doing strange and wonderful things.
>> But thanks for bothering to go through the whole history and figure out
>> what exactly is up.
>
> Oh well, after seeing the move of cancel_dirty_page, I just went
> backwards from __set_page_dirty using cscope + some smart guessing and
> quickly ended up at ext3_invalidatepage, so it wasn't that hard :-)
>
>>> As try_to_free_buffers got its ext3 hack back in
>>> ecdfc9787fe527491baefc22dce8b2dbd5b2908d, maybe
>>> 3e67c0987d7567ad666641164a153dca9a43b11d should be reverted? (Except for
>>> the accounting fix in cancel_dirty_page, of course).
>>
>> Yes, I think we have room for cleanups now, and I agree: we ended up
>> reinstating some questionable code in the VM just because we didn't really
>> know or understand what was going on in the ext3 journal code.
>
> Hm, you attributed more to my mail than there was actually in it. I
> didn't even start to think of cleanups (because I don't know jack about
> the whole ext3/jdb stuff, so I simply cannot come up with any cleanups
> (yet?)).What I meant is that we only did a half-revert of that hackery.
>
> When try_to_free_buffers started to check for PG_dirty, the
> cancel_dirty_page call had to be called before do_invalidatepage, to
> "fix" a _huge_ leak.  But that caused the accouting breakage we're now
> seeing, because we never account for the pages that got redirtied during
> do_invalidatepage.
>
> Then the change to try_to_free_buffers got reverted, so we no longer
> need to call cancel_dirty_page before do_invalidatepage, but still we
> do. Thus the accounting bug remains. So what I meant to suggest was
> simply to actually "finish" the revert we started.
>
> Or expressed as a patch:
>
> diff --git a/mm/truncate.c b/mm/truncate.c
> index cadc156..2974903 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -98,11 +98,11 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
> 	if (page->mapping != mapping)
> 		return;
>
> -	cancel_dirty_page(page, PAGE_CACHE_SIZE);
> -
> 	if (PagePrivate(page))
> 		do_invalidatepage(page, 0);
>
> +	cancel_dirty_page(page, PAGE_CACHE_SIZE);
> +
> 	remove_from_page_cache(page);
> 	ClearPageUptodate(page);
> 	ClearPageMappedToDisk(page);
>
> I'll be the last one to comment on whether or not that causes inaccurate
> accouting, so I'll just watch you and Jan battle that out until someone
> comes up with a post-.24 patch to provide a clean fix for the issue.
>
> Krzysztof, could you give this patch a test run?
>
> If that "fixes" the problem for now, I'll try to come up with some
> usable commit message, or if somehow wants to beat me to it, you can
> already have my
>
> Signed-off-by: Björn Steinbrink <B.Steinbrink@....de>

Checked with 2.6.24-rc5 + debug/fixup patch from Linus + above fix. After 
3h there have been no warnings about __remove_from_page_cache(). So, it 
seems that it is OK.

Tested-by: Krzysztof Piotr Oledzki <ole@....pl>

Best regards,

 				Krzysztof Olędzki