[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzkrRdT1gZm-FBmZU8WKqsLYfC6Q2cF8iGDWqOV6==xfsnA@mail.gmail.com>
Date: Fri, 15 Oct 2021 14:48:14 -0700
From: Yang Shi <shy828301@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@....com>, Hugh Dickins <hughd@...gle.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Matthew Wilcox <willy@...radead.org>,
Peter Xu <peterx@...hat.com>,
Oscar Salvador <osalvador@...e.de>,
Linux MM <linux-mm@...ck.org>,
Linux FS-devel Mailing List <linux-fsdevel@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC v4 PATCH 0/6] Solve silent data loss caused by poisoned page
cache (shmem/tmpfs)
On Fri, Oct 15, 2021 at 1:28 PM Andrew Morton <akpm@...ux-foundation.org> wrote:
>
> On Thu, 14 Oct 2021 12:16:09 -0700 Yang Shi <shy828301@...il.com> wrote:
>
> > When discussing the patch that splits page cache THP in order to offline the
> > poisoned page, Noaya mentioned there is a bigger problem [1] that prevents this
> > from working since the page cache page will be truncated if uncorrectable
> > errors happen. By looking this deeper it turns out this approach (truncating
> > poisoned page) may incur silent data loss for all non-readonly filesystems if
> > the page is dirty. It may be worse for in-memory filesystem, e.g. shmem/tmpfs
> > since the data blocks are actually gone.
> >
> > To solve this problem we could keep the poisoned dirty page in page cache then
> > notify the users on any later access, e.g. page fault, read/write, etc. The
> > clean page could be truncated as is since they can be reread from disk later on.
> >
> > The consequence is the filesystems may find poisoned page and manipulate it as
> > healthy page since all the filesystems actually don't check if the page is
> > poisoned or not in all the relevant paths except page fault. In general, we
> > need make the filesystems be aware of poisoned page before we could keep the
> > poisoned page in page cache in order to solve the data loss problem.
>
> Is the "RFC" still accurate, or might it be an accidental leftover?
Yeah, I think it can be removed.
>
> I grabbed this series as-is for some testing, but I do think it wouild
> be better if it was delivered as two separate series - one series for
> the -stable material and one series for the 5.16-rc1 material.
Yeah, the patch 1/6 and patch 2/6 should go to -stable, then the
remaining patches are for 5.16-rc1. Thanks for taking them.
>
Powered by blists - more mailing lists