linux-kernel - Re: [PATCH v2 0/8] dce, riscv: Unused syscall trimming with PUSHSECTION and conditional KEEP()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0BF8B2E83B6154B6+f17f32b4-f6ff-4184-917d-4b27fb916eae@tinylab.org>
Date: Mon, 3 Nov 2025 18:21:59 -0800
From: Yuan Tan <tanyuan@...ylab.org>
To: Arnd Bergmann <arnd@...db.de>, Masahiro Yamada <masahiroy@...nel.org>,
 Nathan Chancellor <nathan@...nel.org>, Palmer Dabbelt <palmer@...belt.com>,
 linux-kbuild@...r.kernel.org, linux-riscv@...ts.infradead.org
Cc: Linux-Arch <linux-arch@...r.kernel.org>, linux-kernel@...r.kernel.org,
 i@...kray.me, Zhangjin Wu <falcon@...ylab.org>, ronbogo@...look.com,
 z1652074432@...il.com, lx24@....ynu.edu.cn
Subject: Re: [PATCH v2 0/8] dce, riscv: Unused syscall trimming with
 PUSHSECTION and conditional KEEP()


On 10/15/2025 12:47 AM, Arnd Bergmann wrote:
> On Wed, Oct 15, 2025, at 08:16, Yuan Tan wrote:
>> Hi all,
>>
>> This series aims to introduce syscall trimming support based on dead code
>> and data elimination (DCE). This can reduce the final image size, which is
>> particularly useful for embedded devices, while also reducing the attack
>> surface. It might further benefit specialized scenarios such as unikernels
>> or LTO builds, and could potentially help shrink the instruction cache
>> footprint.
>>
>> Besides that, this series also introduces a new PUSHSECTION macro. This
>> wrapper allows sections created by .pushsection to have a proper reference
>> relationship with their callers, so that --gc-sections can safely work
>> without requiring unconditional KEEP() entries in linker scripts.
>>
>> Since the new syscalltbl.sh infrastructure has been merged, I think it’s a
>> good time to push this patchsetTODO? forward.
>>
>> Patch 1–3 introduce the infrastructure for TRIM_UNUSED_SYSCALLS, mainly
>> allowing syscalltbl.sh to decide which syscalls to keep according to
>> USED_SYSCALLS.
>> Patch 4 enables TRIM_UNUSED_SYSCALLS for the RISC-V architecture. With
>> syscalltbl.sh now available, this feature should be applicable to all
>> architectures that support LD_DEAD_CODE_DATA_ELIMINATION and use
>> syscalltbl.sh, but let’s focus on RISC-V first.
>> Patch 5–8 address the dependency inversion problem caused by sections
>> created with .pushsection that are forcibly retained by KEEP() in linker
>> scripts.
> Thanks a lot for your work on this. I think it is indeed valuable to
> be able to optimize kernels with a smaller subset of system calls for
> known workloads, and have as much dead code elimination as possible.
>
> However, I continue to think that the added scripting with a known
> set of syscall names is fundamentally the wrong approach to get to
> this list: This adds complexity to the build process in one of
> the areas that is already too complicated, and it duplicates what
> we can already do with Kconfig for a subset of the system calls.
>
> I think the way we should configure the set of syscalls instead is
> to add more Kconfig symbols guarded by CONFIG_EXPERT that turn
> classes of syscalls on or off. You have obviously done the research
> to come up with a list of used/unused entry points for one or more
> workloads. Can you share those lists?
>
>       Arnd


Hi Arnd,

Sorry for the late reply — this patchset really wore me out, and I only just
recovered.  Thank you very much for your feedback!

Regarding your suggestion to use Kconfig to control which system calls are
included or excluded, perhaps we could take inspiration from systemd's
classification approach. For example, systemd groups syscalls into categories
like[1]:

@aio @basic-io @chown @clock @cpu-emulation @debug @file-system

and so on.

However, if we go down this route, we would need to continuously maintain and
update these categories whenever Linux introduces new system calls. I' m not
sure whether that would be an ideal long-term approach.

For reference, here is the list of syscalls required to run Lighttpd.

execve set_tid_address mount write brk mmap munmap getuid getgid getpid
clock_gettime getcwd fcntl fstat read dup3 socket setsockopt bind listen
rt_sigaction rt_sigprocmask newfstatat prlimit64 epoll_create1 epoll_ctl pipe2
epoll_pwait accept4 getsockopt recvfrom shutdown writev getdents64 openat close

We've tested it successfully on QEMU + initramfs, and I can share the
deployment script if anyone would like to reproduce the setup.

Also, I noticed that there haven't been any comments so far on the later
patches introducing the PUSHSECTION macro.  I' m a bit concerned about how
people perceive this part.

[1] https://github.com/systemd/systemd/blob/main/src/shared/seccomp-util.c