[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzkrxFrG9ncaFMVZhnXut0VmON0MP1bM=4DqFgwqXGRtoJg@mail.gmail.com>
Date: Tue, 21 Sep 2021 12:34:44 -0700
From: Yang Shi <shy828301@...il.com>
To: Naoya Horiguchi <naoya.horiguchi@...ux.dev>
Cc: HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@....com>, Hugh Dickins <hughd@...gle.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Matthew Wilcox <willy@...radead.org>,
Oscar Salvador <osalvador@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux MM <linux-mm@...ck.org>,
Linux FS-devel Mailing List <linux-fsdevel@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/4] mm: shmem: don't truncate page if memory failure happens
On Tue, Sep 21, 2021 at 2:49 AM Naoya Horiguchi
<naoya.horiguchi@...ux.dev> wrote:
>
> On Tue, Sep 14, 2021 at 11:37:17AM -0700, Yang Shi wrote:
> > The current behavior of memory failure is to truncate the page cache
> > regardless of dirty or clean. If the page is dirty the later access
> > will get the obsolete data from disk without any notification to the
> > users. This may cause silent data loss. It is even worse for shmem
> > since shmem is in-memory filesystem, truncating page cache means
> > discarding data blocks. The later read would return all zero.
> >
> > The right approach is to keep the corrupted page in page cache, any
> > later access would return error for syscalls or SIGBUS for page fault,
> > until the file is truncated, hole punched or removed. The regular
> > storage backed filesystems would be more complicated so this patch
> > is focused on shmem. This also unblock the support for soft
> > offlining shmem THP.
> >
> > Signed-off-by: Yang Shi <shy828301@...il.com>
> > ---
> > mm/memory-failure.c | 3 ++-
> > mm/shmem.c | 25 +++++++++++++++++++++++--
> > 2 files changed, 25 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 54879c339024..3e06cb9d5121 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1101,7 +1101,8 @@ static int page_action(struct page_state *ps, struct page *p,
> > result = ps->action(p, pfn);
> >
> > count = page_count(p) - 1;
> > - if (ps->action == me_swapcache_dirty && result == MF_DELAYED)
> > + if ((ps->action == me_swapcache_dirty && result == MF_DELAYED) ||
> > + (ps->action == me_pagecache_dirty && result == MF_FAILED))
>
> This new line seems to affect the cases of dirty page cache
> on other filesystems, whose result is to miss "still referenced"
> messages for some unmap failure cases (although it's not so critical).
> So checking filesystem type (for example with shmem_mapping())
> might be helpful?
>
> And I think that if we might want to have some refactoring to pass
> *ps to each ps->action() callback, then move this refcount check to
> the needed places.
> I don't think that we always need the refcount check, for example in
> MF_MSG_KERNEL and MF_MSG_UNKNOWN cases (because no one knows the
> expected values for these cases).
Yeah, seems make sense to me. How's about doing the below (totally untested):
static inline bool check_refcount(struct *page, bool dec)
{
int count = page_count(page) - 1;
if (dec || shmem_mapping(page->mapping))
count -= 1;
if (count > 0) {
pr_err("Memory failure: %#lx: %s still referenced by %d users\n",
pfn, action_page_types[ps->type], count);
return false;
}
return true;
}
Then call this in the needed me_* functions and return right value per
the return value of it. I think me_swapcache_dirty() is the only place
need pass in true for dec parameter.
>
>
> > count--;
> > if (count > 0) {
> > pr_err("Memory failure: %#lx: %s still referenced by %d users\n",
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 88742953532c..ec33f4f7173d 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -2456,6 +2456,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
> > struct inode *inode = mapping->host;
> > struct shmem_inode_info *info = SHMEM_I(inode);
> > pgoff_t index = pos >> PAGE_SHIFT;
> > + int ret = 0;
> >
> > /* i_rwsem is held by caller */
> > if (unlikely(info->seals & (F_SEAL_GROW |
> > @@ -2466,7 +2467,19 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
> > return -EPERM;
> > }
> >
> > - return shmem_getpage(inode, index, pagep, SGP_WRITE);
> > + ret = shmem_getpage(inode, index, pagep, SGP_WRITE);
> > +
> > + if (!ret) {
>
> Maybe this "!ret" check is not necessary because *pagep is set
> non-NULL only when ret is 0. It could save one indent level.
Yes, sure.
>
> > + if (*pagep) {
> > + if (PageHWPoison(*pagep)) {
> > + unlock_page(*pagep);
> > + put_page(*pagep);
> > + ret = -EIO;
> > + }
> > + }
> > + }
> > +
> > + return ret;
> > }
> >
> > static int
> > @@ -2555,6 +2568,11 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> > unlock_page(page);
> > }
> >
> > + if (page && PageHWPoison(page)) {
> > + error = -EIO;
> > + break;
> > + }
> > +
> > /*
> > * We must evaluate after, since reads (unlike writes)
> > * are called without i_rwsem protection against truncate
> > @@ -3782,7 +3800,6 @@ const struct address_space_operations shmem_aops = {
> > #ifdef CONFIG_MIGRATION
> > .migratepage = migrate_page,
> > #endif
> > - .error_remove_page = generic_error_remove_page,
>
> This change makes truncate_error_page() calls invalidate_inode_page(),
> and in my testing it fails with "Failed to invalidate" message.
> So as a result memory_failure() finally returns with -EBUSY. I'm not
> sure it's expected because this patchset changes to keep error pages
> in page cache as a proper error handling.
> Maybe you can avoid this by defining .error_remove_page in shmem_aops
> which simply returns 0.
Yes, the "Failed to invalidate" message seems confusing. I agree a
shmem specific callback is better.
>
> > };
> > EXPORT_SYMBOL(shmem_aops);
> >
> > @@ -4193,6 +4210,10 @@ struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
> > page = ERR_PTR(error);
> > else
> > unlock_page(page);
> > +
> > + if (PageHWPoison(page))
> > + page = NULL;
> > +
> > return page;
>
> One more comment ...
>
> - I guess that you add PageHWPoison() checks after some call sites
> of shmem_getpage() and shmem_getpage_gfp(), but seems not cover all.
> For example, mcontinue_atomic_pte() in mm/userfaultfd.c can properly
> handle PageHWPoison?
No, I didn't touch anything outside shmem.c. I could add this in the
next version.
BTW, I just found another problem for the change in
shmem_read_mapping_page_gfp(), it should return ERR_PTR(-EIO) instead
of NULL since the callers may not handle NULL. Will fix in the next
version too.
>
> I'm trying to test more detail, but in my current understanding,
> this patch looks promising to me. Thank you for your effort.
Thank a lot for taking time do the review.
>
> Thanks,
> Naoya Horiguchi
Powered by blists - more mailing lists