lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALYGNiNNRwp7TfEh4tkP+9B=J9oN2n1cET0WP7u_wkCge5Y5bQ@mail.gmail.com>
Date:	Tue, 20 Jan 2015 15:35:56 +0400
From:	Konstantin Khlebnikov <koct9i@...il.com>
To:	Константин Хлебников 
	<khlebnikov@...dex-team.ru>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Tejun Heo <tj@...nel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>
Subject: Re: [PATCH RFC] page_writeback: cleanup mess around cancel_dirty_page()

On Fri, Jan 16, 2015 at 4:18 PM, Константин Хлебников
<khlebnikov@...dex-team.ru> wrote:
> 16.01.2015, 04:16, "Andrew Morton" <akpm@...ux-foundation.org>:
>> On Thu, 15 Jan 2015 18:57:31 +0300 Konstantin Khebnikov <khlebnikov@...dex-team.ru> wrote:
>>>  This patch replaces cancel_dirty_page() with helper account_page_cleared()
>>>  which only updates counters. It's called from delete_from_page_cache()
>>>  and from try_to_free_buffers() (hack for ext3). Page is locked in both cases.
>>>
>>>  Hugetlbfs has no dirty pages accounting, ClearPageDirty() is enough here.
>>>
>>>  cancel_dirty_page() in nfs_wb_page_cancel() is redundant. This is helper
>>>  for nfs_invalidate_page() and it's called only in case complete invalidation.
>>>
>>>  Open-coded kludge at the end of __delete_from_page_cache() is redundant too.
>>>
>>>  This mess was started in v2.6.20, after commit 3e67c09 ("truncate: clear page
>>>  dirtiness before running try_to_free_buffers()") reverted back in v2.6.25
>>>  by commit a2b3456 ("Fix dirty page accounting leak with ext3 data=journal").
>>>  Custom fixes were introduced between them. NFS in in v2.6.23 in commit
>>>  1b3b4a1 ("NFS: Fix a write request leak in nfs_invalidate_page()").
>>>  Kludge __delete_from_page_cache() in v2.6.24, commit 3a692790 ("Do dirty
>>>  page accounting when removing a page from the page cache").
>>>
>>>  It seems safe to leave dirty flag set on truncated page, free_pages_check()
>>>  will clear it before returning page into buddy allocator.
>>
>> account_page_cleared() is not a good name - "clearing a page" means
>> filling it with zeroes.  account_page_cleaned(), perhaps?
>
> Ok. account_page_cleaned is better.
>
>>
>> I don't think your email cc'ed all the correct people?  lustre, nfs,
>> ext3?
>
> oops
>
>>>  ...
>>>
>>>  --- a/fs/buffer.c
>>>  +++ b/fs/buffer.c
>>>  @@ -3243,8 +3243,8 @@ int try_to_free_buffers(struct page *page)
>>>            * to synchronise against __set_page_dirty_buffers and prevent the
>>>            * dirty bit from being lost.
>>>            */
>>>  - if (ret)
>>>  - cancel_dirty_page(page, PAGE_CACHE_SIZE);
>>>  + if (ret && TestClearPageDirty(page))
>>>  + account_page_cleared(page, mapping);
>>
>> OK.
>>>           spin_unlock(&mapping->private_lock);
>>>   out:
>>>           if (buffers_to_free) {
>>>
>>>  ...
>>>
>>>  --- a/fs/nfs/write.c
>>>  +++ b/fs/nfs/write.c
>>>  @@ -1811,11 +1811,6 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
>>>                    * request from the inode / page_private pointer and
>>>                    * release it */
>>>                   nfs_inode_remove_request(req);
>>>  - /*
>>>  - * In case nfs_inode_remove_request has marked the
>>>  - * page as being dirty
>>>  - */
>>>  - cancel_dirty_page(page, PAGE_CACHE_SIZE);
>>
>> hm, if you say so..
>
> That is main reason of this patch.
> I dont like these obsoleted pieces of duct tape here and there.
>
>>>                   nfs_unlock_and_release_request(req);
>>>           }
>>>
>>>  ...
>>>
>>>  --- a/mm/filemap.c
>>>  +++ b/mm/filemap.c
>>>  @@ -201,18 +201,6 @@ void __delete_from_page_cache(struct page *page, void *shadow)
>>>           if (PageSwapBacked(page))
>>>                   __dec_zone_page_state(page, NR_SHMEM);
>>>           BUG_ON(page_mapped(page));
>>>  -
>>>  - /*
>>>  - * Some filesystems seem to re-dirty the page even after
>>>  - * the VM has canceled the dirty bit (eg ext3 journaling).
>>>  - *
>>>  - * Fix it up by doing a final dirty accounting check after
>>>  - * having removed the page entirely.
>>>  - */
>>>  - if (PageDirty(page) && mapping_cap_account_dirty(mapping)) {
>>>  - dec_zone_page_state(page, NR_FILE_DIRTY);
>>>  - dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
>>>  - }
>>>   }
>>>
>>>   /**
>>>  @@ -230,6 +218,9 @@ void delete_from_page_cache(struct page *page)
>>>
>>>           BUG_ON(!PageLocked(page));
>>>
>>>  + if (PageDirty(page))
>>>  + account_page_cleared(page, mapping);
>>>  +
>>
>> OK, but we lost the important comment - transplant that?
>>
>> It's strange that we left the dirty bit set after accounting for its
>> clearing.  How does this work?  Presumably the offending fs dirtied the
>> page without accounting for it?  I have a bad feeling I wrote that code :(
>
> account_page_dirtyed() must be always called after dirtying non-truncated pages.
> Here page is truncating from mapping, dirty accounting never will see it again.
>
> This is the only place where dirty page might be truncated. All other places:
> replace_page_cache_page, invalidate_complete_page2, __remove_mapping
> (in memory reclaimer) forbid removing dirty pages.
>
> As I see PageDirty means nothing for truncated pages, it's never be written anywhere.
> We could clear dirty bit here, but probably it might appear again: set_page_dirty()
> for some reason has branch for pages without page->mapping.

Another puzzle here: how mark_buffer_dirty() is synchronized with truncate?

do_invalidatepage() called from truncate_complete_page() clears dirty
buffers and
tries to release them all but exit code is ignored. So theoretically
there still might
be pinned buffer heads and somebody might call mark_buffer_dirty() for them.

Funny but mark_buffer_dirty() gets mapping using page_mapping()!
(Is bh might be attached to anon/slab page?)
And it seems there is no protection against truncate
(but it calls __set_page_dirty which checks page->mapping for NULL).

So, if this race is possible the final account_page_cleaned() must be
placed after
clearing page->mapping. But I think we can put in outside of mapping->tree_lock
because mapping cannot run off under us: if truncated page is dirty
that might be
only call from truncate where caller holds reference to the inode.

>
>>>           freepage = mapping->a_ops->freepage;
>>>           spin_lock_irq(&mapping->tree_lock);
>>>           __delete_from_page_cache(page, NULL);
>>>  diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>>>  index 4da3cd5..f371522 100644
>>>  --- a/mm/page-writeback.c
>>>  +++ b/mm/page-writeback.c
>>>  @@ -2106,6 +2106,25 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
>>>   EXPORT_SYMBOL(account_page_dirtied);
>>>
>>>   /*
>>>  + * Helper function for deaccounting dirty page without doing writeback.
>>>  + * Doing this should *normally* only ever be done when a page
>>>  + * is truncated, and is not actually mapped anywhere at all. However,
>>>  + * fs/buffer.c does this when it notices that somebody has cleaned
>>>  + * out all the buffers on a page without actually doing it through
>>>  + * the VM. Can you say "ext3 is horribly ugly"? Tought you could.
>>
>> "Thought".
>
> Ah, ok. That is copy-paste from cancel_dirty_page().
>
>>>  + */
>>>  +void account_page_cleared(struct page *page, struct address_space *mapping)
>>>  +{
>>>  + if (mapping_cap_account_dirty(mapping)) {
>>>  + dec_zone_page_state(page, NR_FILE_DIRTY);
>>>  + dec_bdi_stat(mapping->backing_dev_info,
>>>  + BDI_RECLAIMABLE);
>>>  + task_io_account_cancelled_write(PAGE_CACHE_SIZE);
>>>  + }
>>>  +}
>>>  +EXPORT_SYMBOL(account_page_cleared);
>>>
>>>  ...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ