[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090528122357.GM6920@wotan.suse.de>
Date: Thu, 28 May 2009 14:23:57 +0200
From: Nick Piggin <npiggin@...e.de>
To: Wu Fengguang <fengguang.wu@...el.com>
Cc: Andi Kleen <andi@...stfloor.org>,
"hugh@...itas.com" <hugh@...itas.com>,
"riel@...hat.com" <riel@...hat.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"chris.mason@...cle.com" <chris.mason@...cle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3
On Thu, May 28, 2009 at 05:59:34PM +0800, Wu Fengguang wrote:
> Hi Nick,
>
> > > + /*
> > > + * remove_from_page_cache assumes (mapping && !mapped)
> > > + */
> > > + if (page_mapping(p) && !page_mapped(p)) {
> > > + remove_from_page_cache(p);
> > > + page_cache_release(p);
> > > + }
> >
> > remove_mapping would probably be a better idea. Otherwise you can
> > probably introduce pagecache removal vs page fault races which
> > will make the kernel bug.
>
> We use remove_mapping() at first, then discovered that it made strong
> assumption on page_count=2.
>
> I guess it is safe from races since we are locking the page?
Yes it probably should (although you will lose get_user_pages data, but
I guess that's the aim anyway).
But I just don't like this one file having all that required knowledge
(and few comments) of all the files in mm/. If you want to get rid
of the page and don't care what it's count or dirtyness is, then
truncate_inode_pages_range is the correct API to use.
(or you could extract out some of it so you can call it directly on
individual locked pages, if that helps).
> > > + }
> > > +
> > > + me_pagecache_clean(p);
> > > +
> > > + /*
> > > + * Did the earlier release work?
> > > + */
> > > + if (page_has_private(p) && !try_to_release_page(p, GFP_NOIO))
> > > + return FAILED;
> > > +
> > > + return RECOVERED;
> > > +}
> > > +
> > > +/*
> > > + * Clean and dirty swap cache.
> > > + */
> > > +static int me_swapcache_dirty(struct page *p)
> > > +{
> > > + ClearPageDirty(p);
> > > +
> > > + if (!isolate_lru_page(p))
> > > + page_cache_release(p);
> > > +
> > > + return DELAYED;
> > > +}
> > > +
> > > +static int me_swapcache_clean(struct page *p)
> > > +{
> > > + ClearPageUptodate(p);
> > > +
> > > + if (!isolate_lru_page(p))
> > > + page_cache_release(p);
> > > +
> > > + delete_from_swap_cache(p);
> > > +
> > > + return RECOVERED;
> > > +}
> >
> > All these handlers are quite interesting in that they need to
> > know about most of the mm. What are you trying to do in each
> > of them would be a good idea to say, and probably they should
> > rather go into their appropriate files instead of all here
> > (eg. swapcache stuff should go in mm/swap_state for example).
>
> Yup, they at least need more careful comments.
>
> Dirty swap cache page is tricky to handle. The page could live both in page
> cache and swap cache(ie. page is freshly swapped in). So it could be referenced
> concurrently by 2 types of PTEs: one normal PTE and another swap PTE. We try to
> handle them consistently by calling try_to_unmap(TTU_IGNORE_HWPOISON) to convert
> the normal PTEs to swap PTEs, and then
> - clear dirty bit to prevent IO
> - remove from LRU
> - but keep in the swap cache, so that when we return to it on
> a later page fault, we know the application is accessing
> corrupted data and shall be killed (we installed simple
> interception code in do_swap_page to catch it).
OK this is the point I was missing.
Should all be commented and put into mm/swap_state.c (or somewhere that
Hugh prefers).
> Clean swap cache pages can be directly isolated. A later page fault will bring
> in the known good data from disk.
OK, but why do you ClearPageUptodate if it is just to be deleted from
swapcache anyway?
> > You haven't waited on writeback here AFAIKS, and have you
> > *really* verified it is safe to call delete_from_swap_cache?
>
> Good catch. I'll soon submit patches for handling the under
> read/write IO pages. In this patchset they are simply ignored.
Well that's quite important ;) I would suggest you just wait_on_page_writeback.
It is simple and should work. _Unless_ you can show it is a big problem that
needs equivalently big mes to fix ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists