linux-kernel - Re: [PATCH] iomap: Handle I/O errors gracefully in page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200604230519.GW19604@bombadil.infradead.org>
Date:   Thu, 4 Jun 2020 16:05:19 -0700
From:   Matthew Wilcox <willy@...radead.org>
To:     Dave Chinner <david@...morbit.com>
Cc:     Christoph Hellwig <hch@...radead.org>,
        "Darrick J. Wong" <darrick.wong@...cle.com>,
        linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] iomap: Handle I/O errors gracefully in page_mkwrite

On Fri, Jun 05, 2020 at 08:57:26AM +1000, Dave Chinner wrote:
> On Thu, Jun 04, 2020 at 01:23:40PM -0700, Matthew Wilcox wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@...radead.org>
> > 
> > Test generic/019 often results in:
> > 
> > WARNING: at fs/iomap/buffered-io.c:1069 iomap_page_mkwrite_actor+0x57/0x70
> > 
> > Since this can happen due to a storage error, we should not WARN for it.
> > Just return -EIO, which will be converted to a SIGBUS for the hapless
> > task attempting to write to the page that we can't read.
> 
> Why didn't the "read" part of the fault which had the EIO error fail
> the page fault? i.e. why are we waiting until deep inside the write
> fault path to error out on a failed page read?

I have a hypothesis that I don't know how to verify.

First the task does a load from the page and we put a read-only PTE in
the page tables.  Then it writes to the page using write().  The page
gets written back, but hits an error in iomap_writepage_map()
which calls ClearPageUptodate().  Then the task with it mapped attempts
to store to it.

I haven't dug through what generic/019 does, so I don't know how plausible
this is.