linux-kernel - Re: [PATCH] ext4: use private version of page_zero_new

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <yxyuijjfd6yknryji2q64j3keq2ygw6ca6fs5jwyolklzvo45s@4u63qqqyosy2>
Date: Sun, 26 Jan 2025 18:01:55 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Theodore Ts'o <tytso@....edu>
Cc: Ext4 Developers List <linux-ext4@...r.kernel.org>, 
	Linux Kernel Developers List <linux-kernel@...r.kernel.org>, dave.hansen@...el.com, torvalds@...ux-foundation.org, 
	akpm@...ux-foundation.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] ext4: use private version of page_zero_new_buffers() for
 data=journal mode

On Fri, Oct 09, 2015 at 12:01:09AM -0400, Theodore Ts'o wrote:
> If there is a error while copying data from userspace into the page
> cache during a write(2) system call, in data=journal mode, in
> ext4_journalled_write_end() were using page_zero_new_buffers() from
> fs/buffer.c.  Unfortunately, this sets the buffer dirty flag, which is
> no good if journalling is enabled.  This is a long-standing bug that
> goes back for years and years in ext3, but a combination of (a)
> data=journal not being very common, (b) in many case it only results
> in a warning message. and (c) only very rarely causes the kernel hang,
> means that we only really noticed this as a problem when commit
> 998ef75ddb caused this failure to happen frequently enough to cause
> generic/208 to fail when run in data=journal mode.
> 
> The fix is to have our own version of this function that doesn't call
> mark_dirty_buffer(), since we will end up calling
> ext4_handle_dirty_metadata() on the buffer head(s) in questions very
> shortly afterwards in ext4_journalled_write_end().
> 
> Thanks to Dave Hansen and Linus Torvalds for helping to identify the
> root cause of the problem.
> 

Hello there, a blast from the past.

I see this has landed in b90197b655185a11640cce3a0a0bc5d8291b8ad2

I came here from looking at a pwrite vs will-it-scale and noticing that
pre-faulting eats CPU (over 5% on my Sapphire Rapids) due to SMAP trips.

It used to be that pre-faulting was avoided specifically for that
reason, but it got temporarily reverted due to bugs in ext4, to quote
Linus (see 00a3d660cbac05af34cca149cb80fb611e916935):

>    The commit itself does not appear to be buggy per se, but it is exposing
>    a bug in ext4 (and Ted thinks ext3 too, but we solved that by getting
>    rid of it).  It's too late in the release cycle to really worry about
>    this, even if Dave Hansen has a patch that may actually fix the
>    underlying ext4 problem.  We can (and should) revisit this for the next
>    release.

Given your patch landing I take it this is expected to be fixed now?

Sounds like nobody bothered to revert the revert. Not the end of the
world, but it is few % left on the table for (hopefully) no reason. ofc
testing will be needed, but that's what -next is for

thanks,