linux-ext4 - Re: packed_meta_blocks=1 incompatible with resize2fs?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230628154441.GA383202@mit.edu>
Date:   Wed, 28 Jun 2023 11:44:41 -0400
From:   "Theodore Ts'o" <tytso@....edu>
To:     Roberto Ragusa <mail@...ertoragusa.it>
Cc:     "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: packed_meta_blocks=1 incompatible with resize2fs?

On Wed, Jun 28, 2023 at 04:35:50PM +0200, Roberto Ragusa wrote:
> On 6/28/23 02:03, Theodore Ts'o wrote:
> 
> > Unfortunately, (a) there is no place where the fact that the file
> > system was created with this mkfs option is recorded in the
> > superblock, and (b) once the file system starts getting used, the
> > blocks where the metadata would need to be allocated at the start of
> > the disk will get used for directory and data blocks.
> 
> Isn't resize2fs already capable of migrating directory and data blocks
> away? According to the comments at the beginning of resize2fs.c, I mean.

Yes, but (a) that can only be done off-line (while the file system is
unmounted), and (b) migrating directory and data blocks is quite slow
and inefficient, and it doesn't necessarily leave the data file in the
most optimal way (it didn't do as much as it could to minimize file
fragmentation during the mirgation process).  It was intended for
moving a very small number of blocks, and while it could be improved,
that would be additional software engineering investment.

> 1. reserve the bitmaps and inode table space since the beginning (with mke2fs
> option resize, for example)
> 3. do not add new inodes when expanding (impossible by design, right?)

This would require file system format changes in the kernel, the
kernel on-line resizing code, e2fsck, and the resized2fs for off-line
resizing.  And while we've considered doing (3) for other reasons,
that's not sufficient for this use case, because when we add new block
groups, we have to add block and inode allocation bitmaps, the inode
table, and the block group descriptor blocks.  It's not just the inode
table.

> 2. push things out of the way when the expansion is done
> 
> I could attempt to code something to do 2., but I would either have to
> study resize2fs code, which is not trivial, or write something from scratch,
> based only on the layout docs, which would be also complex and not easily
> mergeable in resize2fs.
> 
> 4. have an offline way (custom tool, or detecting conflicting files and
> temporarily removing them, ...) to free the needed blocks
> 
> At the moment the best option I have is to continue doing what I've been
> doing for years already: use dumpe2fs and debugfs to discover which bg
> contain metadata+journal and selectively use "pvmove" to migrate
> those extents (PE) to the fast PV. Automatable, but still messy.
> Discovering "packed_meta_blocks" turned out not a so great finding as I was
> hoping, if then you can't resize.

Honestly, suspect automating the code to determine which are the block
group descriptors, inode table blocks, and allocation bitmap blocks
represent the PE's that should be migrated to the fast PV is probably
the simplest thing to do.  You should be able to do this using just
dumpe2fs; the journal is generally not going to move while during a
migration.

						- Ted