linux-kernel - Re: Spooling large metadata updates / Proposal for a new API/feature in the Linux Kernel (VFS/Filesystems):

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Z4WiOUZizUok2VAs@dread.disaster.area>
Date: Tue, 14 Jan 2025 10:31:05 +1100
From: Dave Chinner <david@...morbit.com>
To: "Artem S. Tashkinov" <aros@....com>
Cc: linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: Spooling large metadata updates / Proposal for a new API/feature
 in the Linux Kernel (VFS/Filesystems):

On Sat, Jan 11, 2025 at 09:17:49AM +0000, Artem S. Tashkinov wrote:
> Hello,
> 
> I had this idea on 2021-11-07, then I thought it was wrong/stupid, now
> I've asked AI and it said it was actually not bad,

Which shows just how poor "AI" is at analysing complex systems.

> so I'm bringing it
> forward now:
> 
> Imagine the following scenarios:
> 
>  * You need to delete tens of thousands of files.
>  * You need to change the permissions, ownership, or security context
> (chmod, chown, chcon) for tens of thousands of files.
>  * You need to update timestamps for tens of thousands of files.
> 
> All these operations are currently relatively slow because they are
> executed sequentially, generating significant I/O overhead.

Yes, they are executed sequentially by the -application- not the
filesystem. i.e. the performance limiter is application concurrency,
not the filesystem.

> What if these operations could be spooled and performed as a single
> transaction?

You get nothing extra - they'd still executed "sequentially" within
the transaction. Operational concurrency is required to speed up
these operations, and we have io_uring and/or threads for that...

> By bundling metadata updates into one atomic operation,
> such tasks could become near-instant or significantly faster.

No. The filesystem still has to do exactly the same amount of work
to modify thousands of files. Transactional atomicity has nothing to
do with the cost of modification of an otherwise unrelated set of
filesystem objects...

The overhead of 'rm -rf' could be optimised, but the filesystem
would still have to do the directory traversal first to find all the
inodes that have to be unlinked/rmdir'd and process them before the
parent directory is freed. i.e. we can make it *look fast*, but it
still has largely the same cost in terms of IO, CPU and memory
overhead.

And, of course, operations like sync() would have to block on an
offloaded 'rm -rf' operation. That is likely to cause people
more unexpected issues than userspace implementing a concurrent 'rm
-rf' based on sub-dir concurrency....

> This would
> also reduce the number of writes, leading to less wear and tear on
> storage devices.

Not for a typical journalling filesystem. They aggregate and
journal delta changes efficiently, then do batched writeback of the
modified metadata efficiently, eliding all writes possible. 

> Does this idea make sense? If it already exists, or if there’s a reason
> it wouldn’t work, please let me know.

Filesystems can already do operations concurrently. As long as
concurrency for directory traversal based operations is done on a
per-directory level, they scale out to the inherent concurrency the
filesytem can provide.

In the case of XFS, we can scale out to around 600-700,000 file
creates/s, about 1 million chmod/chown/chcon/futimes modifications
per second and about 500,000 unlinks per second. With some VFS
scalablility mods, we can get up around the 1 million file creates/s
mark....

IOWs, if application are having problems with sequential filesystem
operations being slow, the problem is generally application level
algorithms and concurrency and not the filesystem implementations or
syscall interfaces.

-Dave.
-- 
Dave Chinner
david@...morbit.com