linux-kernel - Re: [PATCH] ext4: use private version of page_zero_new

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Z5f-x278Z3wTIugL@dread.disaster.area>
Date: Tue, 28 Jan 2025 08:46:47 +1100
From: Dave Chinner <david@...morbit.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Mateusz Guzik <mjguzik@...il.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Theodore Ts'o <tytso@....edu>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>,
	Linux Kernel Developers List <linux-kernel@...r.kernel.org>,
	akpm@...ux-foundation.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] ext4: use private version of page_zero_new_buffers() for
 data=journal mode

On Mon, Jan 27, 2025 at 12:52:51PM -0800, Dave Hansen wrote:
> On 1/26/25 14:45, Mateusz Guzik wrote:
> >>
> >> So if you don't get around to it, and _if_ I remember this when the
> >> merge window is open, I might do it in my local tree, but then it will
> >> end up being too late for this merge window.
> >>
> > The to-be-unreverted change was written by Dave (cc'ed).
> > 
> > I had a brief chat with him on irc, he said he is going to submit an
> > updated patch.
> 
> I poked at it a bit today. There's obviously been the page=>folio churn
> and also iov_iter_fault_in_readable() got renamed and got some slightly
> new semantics.
....

> Anyway, here's a patch that compiles, boots and doesn't immediately fall
> over on ext4 in case anyone else wants to poke at it. I'll do a real
> changelog, SoB, etc.... and send it out for real tomorrow if it holds up.

> 
> index 4f476411a9a2d..98b37e4c6d43c 100644
> 
> ---
> 
>  b/mm/filemap.c |   25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff -puN mm/filemap.c~generic_perform_write-1 mm/filemap.c
> --- a/mm/filemap.c~generic_perform_write-1	2025-01-27 09:53:13.219120969 -0800
> +++ b/mm/filemap.c	2025-01-27 12:28:40.333920434 -0800
> @@ -4027,17 +4027,6 @@ retry:
>  		bytes = min(chunk - offset, bytes);
>  		balance_dirty_pages_ratelimited(mapping);
>  
> -		/*
> -		 * Bring in the user page that we will copy from _first_.
> -		 * Otherwise there's a nasty deadlock on copying from the
> -		 * same page as we're writing to, without it being marked
> -		 * up-to-date.
> -		 */
> -		if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {
> -			status = -EFAULT;
> -			break;
> -		}
> -
>  		if (fatal_signal_pending(current)) {
>  			status = -EINTR;
>  			break;
> @@ -4055,6 +4044,11 @@ retry:
>  		if (mapping_writably_mapped(mapping))
>  			flush_dcache_folio(folio);
>  
> +		/*
> +		 * This needs to be atomic because actually handling page
> +		 * faults on 'i' can deadlock if the copy targets a
> +		 * userspace mapping of 'folio'.
> +		 */
>  		copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
>  		flush_dcache_folio(folio);
>  
> @@ -4080,6 +4074,15 @@ retry:
>  				bytes = copied;
>  				goto retry;
>  			}
> +			/*
> +			 * 'folio' is now unlocked and faults on it can be
> +			 * handled. Ensure forward progress by trying to
> +			 * fault it in now.
> +			 */
> +                        if (fault_in_iov_iter_readable(i, bytes) == bytes) {
> +                                status = -EFAULT;
> +                                break;
> +                        }
>  		} else {
>  			pos += status;
>  			written += status;

Shouldn't all the other places that have exactly the same
fault_in_iov_iter_readable()/copy_folio_from_iter_atomic() logic
and comments (e.g.  iomap_write_iter()) be changed to do this the
same way as this new code in generic_perform_write()?

-Dave.
-- 
Dave Chinner
david@...morbit.com