linux-kernel - Re: [RFC v4 PATCH 0/6] Solve silent data loss caused by poisoned page cache (shmem/tmpfs)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20211015132800.357d891d0b3ad34adb9c7383@linux-foundation.org>
Date:   Fri, 15 Oct 2021 13:28:00 -0700
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Yang Shi <shy828301@...il.com>
Cc:     naoya.horiguchi@....com, hughd@...gle.com,
        kirill.shutemov@...ux.intel.com, willy@...radead.org,
        peterx@...hat.com, osalvador@...e.de, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC v4 PATCH 0/6] Solve silent data loss caused by poisoned
 page cache (shmem/tmpfs)

On Thu, 14 Oct 2021 12:16:09 -0700 Yang Shi <shy828301@...il.com> wrote:

> When discussing the patch that splits page cache THP in order to offline the
> poisoned page, Noaya mentioned there is a bigger problem [1] that prevents this
> from working since the page cache page will be truncated if uncorrectable
> errors happen.  By looking this deeper it turns out this approach (truncating
> poisoned page) may incur silent data loss for all non-readonly filesystems if
> the page is dirty.  It may be worse for in-memory filesystem, e.g. shmem/tmpfs
> since the data blocks are actually gone.
> 
> To solve this problem we could keep the poisoned dirty page in page cache then
> notify the users on any later access, e.g. page fault, read/write, etc.  The
> clean page could be truncated as is since they can be reread from disk later on.
> 
> The consequence is the filesystems may find poisoned page and manipulate it as
> healthy page since all the filesystems actually don't check if the page is
> poisoned or not in all the relevant paths except page fault.  In general, we
> need make the filesystems be aware of poisoned page before we could keep the
> poisoned page in page cache in order to solve the data loss problem.

Is the "RFC" still accurate, or might it be an accidental leftover?

I grabbed this series as-is for some testing, but I do think it wouild
be better if it was delivered as two separate series - one series for
the -stable material and one series for the 5.16-rc1 material.