lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250711045225.GA245049@mit.edu>
Date: Fri, 11 Jul 2025 00:52:25 -0400
From: "Theodore Ts'o" <tytso@....edu>
To: Jan Kara <jack@...e.cz>
Cc: Thadeu Lima de Souza Cascardo <cascardo@...lia.com>,
        Jan Kara <jack@...e.com>, Tao Ma <boyu.mt@...bao.com>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Eric Biggers <ebiggers@...gle.com>, linux-ext4@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        kernel-dev@...lia.com,
        syzbot+0c89d865531d053abb2d@...kaller.appspotmail.com,
        stable@...r.kernel.org
Subject: Re: [PATCH v2] ext4: inline: convert when mmap is called, not when
 page is written

On Mon, May 26, 2025 at 06:20:11PM +0200, Jan Kara wrote:
> 
> So I would *love* to do this and was thinking about this as well. But the
> trouble is that this causes lock inversion as well because ->mmap callback
> is called with mmap_lock held and so we cannot acquire inode_lock here
> either.
> 
> Recent changes which switch from ->mmap to ->mmap_prepare callback are
> actually going in a suitable direction but we'd need a rather larger
> rewrite to get from under mmap_lock and I'm not sure that's justified.

This came up in discussions at this week's ext4 video conference, and
afterwards, I started thinking.  I suspect the best solution is one
where we avoid using write_begin/write_end calls entirely.  Instead,
we allocate a single block using ext4_mb_new_block() (see what
ext4_new_meta_blocks does for an example); then we write the contents
into the newly allocated blocks; and if the allocate and the write are
successful, only then do we start a jbd2 handle, to zero the
i_blocks[] array and then set logical block 0 to point at the newly
allocated data block, and then close the handle.

Splitting the block allocation from the "update the logical->physical
mapping", which is currently done in ext4_map_blocks(), into two
separate operations has been one of the things I had been wanting to
do for a while, since it would allow us to replace the current
dioread_nolock buffered write pqth, where we call ext4_map_blocks() to
allocate the blocks but mark the blocks as uninitialized, and then we
write the data blocks, and then we do a second ext4_map_blocks() call
to clear the uninitialized flag.

So it's a bit more involved since it requires refactoring parts of
ext4_map_blocks() to create a new function, ext4_set_blocks() which
updates the extent tree or indirect block map using block(s) that were
allocated separately.  But this would be useful for things besides
just the inline file conversion operation, so it's worth the effort.

Cheers,

						- Ted

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ