[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <73010511-a804-4cf4-a5c1-1d08e3f324c5@app.fastmail.com>
Date: Fri, 07 Nov 2025 14:33:12 +0100
From: "Arnd Bergmann" <arnd@...db.de>
To: "Yuan Tan" <tanyuan@...ylab.org>,
"Masahiro Yamada" <masahiroy@...nel.org>,
"Nathan Chancellor" <nathan@...nel.org>,
"Palmer Dabbelt" <palmer@...belt.com>, linux-kbuild@...r.kernel.org,
linux-riscv@...ts.infradead.org
Cc: Linux-Arch <linux-arch@...r.kernel.org>, linux-kernel@...r.kernel.org,
i@...kray.me, "Zhangjin Wu" <falcon@...ylab.org>, ronbogo@...look.com,
z1652074432@...il.com, lx24@....ynu.edu.cn
Subject: Re: [PATCH v2 0/8] dce, riscv: Unused syscall trimming with PUSHSECTION and
conditional KEEP()
On Tue, Nov 4, 2025, at 03:21, Yuan Tan wrote:
>> Sorry for the late reply — this patchset really wore me out, and I only just
>> recovered. Thank you very much for your feedback!
Sorry to hear this has been stressful for you. It's an unfortunate
aspect of the way we work that sometimes
> On 10/15/2025 12:47 AM, Arnd Bergmann wrote:
>> On Wed, Oct 15, 2025, at 08:16, Yuan Tan wrote:
>> Thanks a lot for your work on this. I think it is indeed valuable to
>> be able to optimize kernels with a smaller subset of system calls for
>> known workloads, and have as much dead code elimination as possible.
>>
>> However, I continue to think that the added scripting with a known
>> set of syscall names is fundamentally the wrong approach to get to
>> this list: This adds complexity to the build process in one of
>> the areas that is already too complicated, and it duplicates what
>> we can already do with Kconfig for a subset of the system calls.
>>
>> I think the way we should configure the set of syscalls instead is
>> to add more Kconfig symbols guarded by CONFIG_EXPERT that turn
>> classes of syscalls on or off. You have obviously done the research
>> to come up with a list of used/unused entry points for one or more
>> workloads. Can you share those lists?
>
> Regarding your suggestion to use Kconfig to control which system calls are
> included or excluded, perhaps we could take inspiration from systemd's
> classification approach. For example, systemd groups syscalls into categories
> like[1]:
>
> @aio @basic-io @chown @clock @cpu-emulation @debug @file-system
>
> and so on.
I think many of the categories already naturally align with the
structure of the kernel source code, so maintaining them naturally comes
out of the build system.
More importantly, turning off parts of the kernel on a per-file
basis tends to work better for eliminating the entire block
of code because only removing the syscall entry still leaves
references to functions and global data structures from initcalls
and exported functions.
> However, if we go down this route, we would need to continuously maintain and
> update these categories whenever Linux introduces new system calls. I' m not
> sure whether that would be an ideal long-term approach.
If we can (at least roughly) align the categories between the kernel and the
systemd classification, that would at least make it easier to maintain
the systemd ones.
> For reference, here is the list of syscalls required to run Lighttpd.
>
> execve set_tid_address mount write brk mmap munmap getuid getgid getpid
> clock_gettime getcwd fcntl fstat read dup3 socket setsockopt bind listen
> rt_sigaction rt_sigprocmask newfstatat prlimit64 epoll_create1 epoll_ctl pipe2
> epoll_pwait accept4 getsockopt recvfrom shutdown writev getdents64 openat close
>
> We've tested it successfully on QEMU + initramfs, and I can share the
> deployment script if anyone would like to reproduce the setup.
Thanks for the list! Is this a workload you are interested in actually
optimizing for deployment, or just something you used as a simple test
environment?
I see three types of syscalls in your list above:
1. essential ones that are basically always needed
2. socket interfaces (already optional)
3. epoll (already optional)
The first two sets are clearly going to have more syscalls in
them that are usually used in combination with the others:
If we provide read, write and writev, we should also provide readv,
and if we provide socket/bind/listen/recvfrom, we also likely want
accept/connect/sendto and probably recvmsg/sendmsg.
Starting with your set of syscalls and those closely related
ones, as well as the set of syscalls that already have a
Kconfig option, we should be able to find the set of syscalls
that are unconditionally enabled but could be optional.
If you have the chance, could you compile that list?
I might also have a list, but probably not in the next week.
The next step after that I think is to measure the impact
of turning off those remaining ones in a configuration that
has the existing symbols (e.g. sysvipc, futex, compat_32bit_time,
...) disabled already.
Side note: I'm a bit surprised to see fstat() in the list, since riscv
should only really support newfstat().
> Also, I noticed that there haven't been any comments so far on the later
> patches introducing the PUSHSECTION macro. I' m a bit concerned about how
> people perceive this part.
I don't have a strong opinion on this part.
Arnd
Powered by blists - more mailing lists