linux-ext4 - Re: allowing ext4 file systems that wrapped inode count to continue working

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <989746e8-07b8-2318-344c-96ed4cd5f2ed@uls.co.za>
Date:   Mon, 30 Jul 2018 17:56:02 +0200
From:   Jaco Kroon <jaco@....co.za>
To:     "Theodore Y. Ts'o" <tytso@....edu>
Cc:     Andreas Dilger <adilger@...ger.ca>, Jan Kara <jack@...e.cz>,
        linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Re: allowing ext4 file systems that wrapped inode count to continue
 working

Again, thanks Ted!

On 30/07/2018 17:06, Theodore Y. Ts'o wrote:
> On Mon, Jul 30, 2018 at 10:56:01AM +0200, Jaco Kroon wrote:
>> Is there any way to mark those blocks that's being freed to not be
>> re-used?  I was contemplating setting them as badblocks using fsck so
>> that I can online the filesystem in cycles so that I can get backups to
>> function overnight, when they are done in the morning, offline and
>> perform the next cycle? 
> So you can use debugfs's setb, but then you can't use the allocation
> bitmap to check to see whether you have accounted for all of the
> groups.
>
> If you are willing to modify and recompile the kernel, you could just
> make a simple hack to ext4_mb_good_group() in fs/ext4/mballoc.c, and
> add something like this at the very beginning of the funcion:
>
> 	/* replace XXX with the block group you are trying to evacuate */
> 	if (group == XXXX)
> 		return 0;
>
> This will cause ext4 to not allocate blocks in that block group.
Perfect.  We'll consider this if after the current pass there are still
blocks allocated there.
>
> Similarly, instead of just specifying all of the blocks to the icheck
> command, you could modify and recompile debugfs, and do something like
> this at the beginning of icheck_proc():
>
> 	/* replace YYYY with the first block in the block group
> 	   you are trying to evacuate */
> 	if (*block_nr > YYYY) {
> 		printf("I: %lu\n", bw->inode);
> 		return 0;
> 	} 
>
> This is super hacky since it would dedup the list of inodes, but you
> can just save the output to the file, and then do something like this:
>
>    grep -v "^I: " < debugfs.out | sed -e 's/I: //' | sort -u > /tmp/list-of-inos
That's a hack I'm willing to pull thanks.  One of those "why didn't I
think of that".  Will definitely do this on the next round.
> Finally, a much simpler thing to do instead of copying it to the file
> system you are trying to work on, is to simply copy the file somewhere
> *else*.  You only need to copy the files that have blocks in the last
> block group, and that's very likely less than a gig or two, so you can
> probably find enough swing space on another scratch disk (even if you
> have to use a USB attached HDD) as the destination.  Then you don't
> need to do the hack described above to prevent allocations to that
> last block group.
>
>
Got that one covered.  We created a new 4TB FS (which will be able to
grow to 512TB given the currently known limitations, host can accomodate
300TB without external expansion but I'd rather split the workload by
then) to which we're migrating, so we're using that as a temporary
scratch area too.  The first file I encountered was around 500GB though
... only had a few blocks inside the last group.  As is the one that
just came up - hopefully that'll clear the group though.  Once I've
managed to chop off that last group we'll proceed with a move & shrink &
grow cycle until the current filesystem is gone.

Kind Regards,
Jaco