linux-kernel - Re: Wislist for Linux from the mold linker's POV

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACKH++br0qCHhxsy1kuyK29OB_bgME3FUXA_XepRL=7FYXOvQA@mail.gmail.com>
Date: Fri, 29 Nov 2024 09:44:30 +0900
From: Rui Ueyama <rui314@...il.com>
To: Florian Weimer <fw@...eb.enyo.de>
Cc: LKML <linux-kernel@...r.kernel.org>
Subject: Re: Wislist for Linux from the mold linker's POV

On Fri, Nov 29, 2024 at 2:41 AM Florian Weimer <fw@...eb.enyo.de> wrote:
>
> * Rui Ueyama:
>
> > - exit(2) takes a few hundred milliseconds for a large process
> >
> > I believe this is because mold mmaps all input files and an output
> > file, and clearing/flushing memory-mapped data is fairly expensive. I
> > wondered if this could be improved. If it is unavoidable, could the
> > cleanup process be made asynchronous so that exit(2) takes effect
> > immediately?
>
> It's definitely a two-edged sword.  For example, when running parallel
> make (or Ninja), it's essential that process exit is only signaled
> after all process-related resources have been released.  Otherwise,
> it's possible to see spurious failures because make respawns processes
> so quickly that some resource limit is exceeded.  This is already a
> problem today, and more lazy resource deallocation on exit would make
> it more prevalent.
>
> The situation is already bad enough that many developers have resorted
> to retry loops around fork/clone/pthread_create if an EAGAIN error is
> encountered, assuming  that it's related to this.

I think you are right. Making exit(2) asynchronous may cause that issue.

Can we simply solve the problem by just making exit(2) significantly
faster than it is now? That's the way of thinking when we created the
mold linker. I don't know much about what exit(2) actually does in the
kernel, but there might be room for improvements, given that it
currently takes a few hundred milliseconds for us when linking a large
program. I wish it could be an order of magnitude or two faster.

>   Bug 154011 - Task exit is signaled before task resource
>   deallocation, leading to bogus EAGAIN errors
>   <https://bugzilla.kernel.org/show_bug.cgi?id=154011>
>
> > - Writing to a fresh file is slower than writing to an existing file
> >
> > mold can link a 4 GiB LLVM/clang executable in ~1.8 seconds on my
> > machine if the linker reuses an existing file and overwrites it.
> > However, the speed decreases to ~2.8 seconds if the output file does
> > not exist and mold needs to create a fresh file. I tried using
> > fallocate(2) to preallocate disk blocks, but it didn't help. While 4
> > GiB is not small, should creating a file really take almost a second?
>
> Which file system is that?

ext4 on a PCIe Gen.5 SSD, but I guess it probably doesn't matter much
because we observed similar results even on tmpfs (~1.75s vs. 2.45s
when linking clang).

> > - Lack of a safe system-wide semaphore
>
> Other toolchain components use the make jobserver protocol for that.

The make jobserver protocol is designed for single-threaded processes
and doesn't fit well with our program. But yeah, we probably need a
better user-space coordination mechanism that works for both
single-threaded and multi-threaded programs.