[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <252d91e2-282e-4af4-b99b-3b8147d98bc3@gmail.com>
Date: Sat, 10 Aug 2024 10:29:38 +0200
From: Anders Blomdell <anders.blomdell@...il.com>
To: Dave Chinner <david@...morbit.com>
Cc: linux-xfs@...r.kernel.org,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Chandan Babu R <chandan.babu@...cle.com>, "Darrick J. Wong"
<djwong@...nel.org>, Christoph Hellwig <hch@....de>
Subject: Re: XFS mount timeout in linux-6.9.11
On 2024-08-10 00:55, Dave Chinner wrote:
> On Fri, Aug 09, 2024 at 07:08:41PM +0200, Anders Blomdell wrote:
>> With a filesystem that contains a very large amount of hardlinks
>> the time to mount the filesystem skyrockets to around 15 minutes
>> on 6.9.11-200.fc40.x86_64 as compared to around 1 second on
>> 6.8.10-300.fc40.x86_64,
>
> That sounds like the filesystem is not being cleanly unmounted on
> 6.9.11-200.fc40.x86_64 and so is having to run log recovery on the
> next mount and so is recovering lots of hardlink operations that
> weren't written back at unmount.
>
> Hence this smells like an unmount or OS shutdown process issue, not
> a mount issue. e.g. if something in the shutdown scripts hangs,
> systemd may time out the shutdown and power off/reboot the machine
> wihtout completing the full shutdown process. The result of this is
> the filesystem has to perform recovery on the next mount and so you
> see a long mount time because of some other unrelated issue.
>
> What is the dmesg output for the mount operations? That will tell us
> if journal recovery is the difference for certain. Have you also
> checked to see what is happening in the shutdown/unmount process
> before the long mount times occur?
echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg
mount /dev/vg1/test /test
echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg
umount /test
echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg
mount /dev/vg1/test /test
echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg
[55581.470484] 6.8.0-rc4-00129-g14dd46cf31f4 09:17:20
[55581.492733] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[56048.292804] XFS (dm-7): Ending clean mount
[56516.433008] 6.8.0-rc4-00129-g14dd46cf31f4 09:32:55
[56516.434695] XFS (dm-7): Unmounting Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[56516.925145] 6.8.0-rc4-00129-g14dd46cf31f4 09:32:56
[56517.039873] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[56986.017144] XFS (dm-7): Ending clean mount
[57454.876371] 6.8.0-rc4-00129-g14dd46cf31f4 09:48:34
And rebooting to the kernel before the offending commit:
[ 60.177951] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:00
[ 61.009283] SGI XFS with ACLs, security attributes, realtime, scrub, quota, no debug enabled
[ 61.017422] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[ 61.351100] XFS (dm-7): Ending clean mount
[ 61.366359] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:01
[ 61.367673] XFS (dm-7): Unmounting Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[ 61.444552] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:01
[ 61.459358] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[ 61.513938] XFS (dm-7): Ending clean mount
[ 61.524056] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:01
>
>> this of course makes booting drop
>> into emergency mode if the filesystem is in /etc/fstab. A git bisect
>> nails the offending commit as 14dd46cf31f4aaffcf26b00de9af39d01ec8d547.
>
> Commit 14dd46cf31f4 ("xfs: split xfs_inobt_init_cursor") doesn't
> seem like a candidate for any sort of change of behaviour. It's just
> a refactoring patch that doesn't change any behaviour at all.
> Are you sure the reproducer you used for the bisect is reliable?
Yes.
>> The filesystem is a collection of daily snapshots of a live filesystem
>> collected over a number of years, organized as a storage of unique files,
>> that are reflinked to inodes that contain the actual {owner,group,permission,
>> mtime}, and these inodes are hardlinked into the daily snapshot trees.
>
> So it's reflinks and hardlinks. Recovering a reflink takes a lot
> more CPU time and journal traffic than recovering a hardlink, so
> that will also be a contributing factor.
>
>> The numbers for the filesystem are:
>>
>> Total file size: 3.6e+12 bytes
>
> 3.6TB, not a large data set by any measurement.
>
>> Unique files: 12.4e+06
>
> 12M files, not a lot.
>
>> Reflink inodes: 18.6e+06
>
> 18M inodes with shared extents, not a huge number, either.
>
>> Hardlinks: 15.7e+09
>
> Ok, 15.7 billion hardlinks is a *lot*.
:-)
>
> And by a lot, I mean that's the largest number of hardlinks in an
> XFS filesystem I've personally ever heard about in 20 years.
Glad to be of service.
>
> As a warning: hope like hell you never have a disaster with that
> storage and need to run xfs_repair on that filesystem. It you don't
> have many, many TBs of RAM, just checking the hardlinks resolve
> correctly could take billions of IOs...
I hope so as well :-), but it is not a critical system (used for testing
and statistics, will take about a month to rebuild though :-/).
>
> -Dave.
Powered by blists - more mailing lists