linux-kernel - Re: XFS mount timeout in linux-6.9.11

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZraeRdPmGXpbRM7V@dread.disaster.area>
Date: Sat, 10 Aug 2024 08:55:01 +1000
From: Dave Chinner <david@...morbit.com>
To: Anders Blomdell <anders.blomdell@...il.com>
Cc: linux-xfs@...r.kernel.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Chandan Babu R <chandan.babu@...cle.com>,
	"Darrick J. Wong" <djwong@...nel.org>,
	Christoph Hellwig <hch@....de>
Subject: Re: XFS mount timeout in linux-6.9.11

On Fri, Aug 09, 2024 at 07:08:41PM +0200, Anders Blomdell wrote:
> With a filesystem that contains a very large amount of hardlinks
> the time to mount the filesystem skyrockets to around 15 minutes
> on 6.9.11-200.fc40.x86_64 as compared to around 1 second on
> 6.8.10-300.fc40.x86_64,

That sounds like the filesystem is not being cleanly unmounted on
6.9.11-200.fc40.x86_64 and so is having to run log recovery on the
next mount and so is recovering lots of hardlink operations that
weren't written back at unmount.

Hence this smells like an unmount or OS shutdown process issue, not
a mount issue. e.g. if something in the shutdown scripts hangs,
systemd may time out the shutdown and power off/reboot the machine
wihtout completing the full shutdown process. The result of this is
the filesystem has to perform recovery on the next mount and so you
see a long mount time because of some other unrelated issue.

What is the dmesg output for the mount operations? That will tell us
if journal recovery is the difference for certain.  Have you also
checked to see what is happening in the shutdown/unmount process
before the long mount times occur?

> this of course makes booting drop
> into emergency mode if the filesystem is in /etc/fstab. A git bisect
> nails the offending commit as 14dd46cf31f4aaffcf26b00de9af39d01ec8d547.

Commit 14dd46cf31f4 ("xfs: split xfs_inobt_init_cursor") doesn't
seem like a candidate for any sort of change of behaviour. It's just
a refactoring patch that doesn't change any behaviour at all. Are
you sure the reproducer you used for the bisect is reliable?

> The filesystem is a collection of daily snapshots of a live filesystem
> collected over a number of years, organized as a storage of unique files,
> that are reflinked to inodes that contain the actual {owner,group,permission,
> mtime}, and these inodes are hardlinked into the daily snapshot trees.

So it's reflinks and hardlinks. Recovering a reflink takes a lot
more CPU time and journal traffic than recovering a hardlink, so
that will also be a contributing factor.

> The numbers for the filesystem are:
> 
>   Total file size:           3.6e+12 bytes

3.6TB, not a large data set by any measurement.

>   Unique files:             12.4e+06

12M files, not a lot.

>   Reflink inodes:           18.6e+06

18M inodes with shared extents, not a huge number, either.

>   Hardlinks:                15.7e+09

Ok, 15.7 billion hardlinks is a *lot*.

And by a lot, I mean that's the largest number of hardlinks in an
XFS filesystem I've personally ever heard about in 20 years.

As a warning: hope like hell you never have a disaster with that
storage and need to run xfs_repair on that filesystem. It you don't
have many, many TBs of RAM, just checking the hardlinks resolve
correctly could take billions of IOs...

-Dave.
-- 
Dave Chinner
david@...morbit.com