lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <921F22AF0D7D10F0+3a743e26-ae83-40e8-b266-ccffe478d2c7@tinylab.org>
Date: Tue, 2 Dec 2025 22:02:31 -0800
From: Yuan Tan <tanyuan@...ylab.org>
To: Arnd Bergmann <arnd@...db.de>, Masahiro Yamada <masahiroy@...nel.org>,
 Nathan Chancellor <nathan@...nel.org>, Palmer Dabbelt <palmer@...belt.com>,
 linux-kbuild@...r.kernel.org, linux-riscv@...ts.infradead.org
Cc: Linux-Arch <linux-arch@...r.kernel.org>, linux-kernel@...r.kernel.org,
 i@...kray.me, Zhangjin Wu <falcon@...ylab.org>, ronbogo@...look.com,
 z1652074432@...il.com, lx24@....ynu.edu.cn
Subject: Re: [PATCH v2 0/8] dce, riscv: Unused syscall trimming with
 PUSHSECTION and conditional KEEP()


On 11/7/2025 5:33 AM, Arnd Bergmann wrote:
> On Tue, Nov 4, 2025, at 03:21, Yuan Tan wrote:
>
>>> Sorry for the late reply — this patchset really wore me out, and I only just
>>> recovered.  Thank you very much for your feedback!
> Sorry to hear this has been stressful for you. It's an unfortunate
> aspect of the way we work that sometimes 
>
>> On 10/15/2025 12:47 AM, Arnd Bergmann wrote:
>>> On Wed, Oct 15, 2025, at 08:16, Yuan Tan wrote:
>>> Thanks a lot for your work on this. I think it is indeed valuable to
>>> be able to optimize kernels with a smaller subset of system calls for
>>> known workloads, and have as much dead code elimination as possible.
>>>
>>> However, I continue to think that the added scripting with a known
>>> set of syscall names is fundamentally the wrong approach to get to
>>> this list: This adds complexity to the build process in one of
>>> the areas that is already too complicated, and it duplicates what
>>> we can already do with Kconfig for a subset of the system calls.
>>>
>>> I think the way we should configure the set of syscalls instead is
>>> to add more Kconfig symbols guarded by CONFIG_EXPERT that turn
>>> classes of syscalls on or off. You have obviously done the research
>>> to come up with a list of used/unused entry points for one or more
>>> workloads. Can you share those lists?
>> Regarding your suggestion to use Kconfig to control which system calls are
>> included or excluded, perhaps we could take inspiration from systemd's
>> classification approach. For example, systemd groups syscalls into categories
>> like[1]:
>>
>> @aio @basic-io @chown @clock @cpu-emulation @debug @file-system
>>
>> and so on.
> I think many of the categories already naturally align with the
> structure of the kernel source code, so maintaining them naturally comes
> out of the build system.
>
> More importantly, turning off parts of the kernel on a per-file
> basis tends to work better for eliminating the entire block
> of code because only removing the syscall entry still leaves
> references to functions and global data structures from initcalls
> and exported functions.
>
>> However, if we go down this route, we would need to continuously maintain and
>> update these categories whenever Linux introduces new system calls. I' m not
>> sure whether that would be an ideal long-term approach.
> If we can (at least roughly) align the categories between the kernel and the
> systemd classification, that would at least make it easier to maintain
> the systemd ones.
>
>> For reference, here is the list of syscalls required to run Lighttpd.
>>
>> execve set_tid_address mount write brk mmap munmap getuid getgid getpid
>> clock_gettime getcwd fcntl fstat read dup3 socket setsockopt bind listen
>> rt_sigaction rt_sigprocmask newfstatat prlimit64 epoll_create1 epoll_ctl pipe2
>> epoll_pwait accept4 getsockopt recvfrom shutdown writev getdents64 openat close
>>
>> We've tested it successfully on QEMU + initramfs, and I can share the
>> deployment script if anyone would like to reproduce the setup.
> Thanks for the list! Is this a workload you are interested in actually
> optimizing for deployment, or just something you used as a simple test
> environment?
>
> I see three types of syscalls in your list above:
>
> 1. essential ones that are basically always needed
> 2. socket interfaces (already optional)
> 3. epoll (already optional)
>
> The first two sets are clearly going to have more syscalls in
> them that are usually used in combination with the others:
> If we provide read, write and writev, we should also provide readv,
> and if we provide socket/bind/listen/recvfrom, we also likely want
> accept/connect/sendto and probably recvmsg/sendmsg.
>
> Starting with your set of syscalls and those closely related
> ones, as well as the set of syscalls that already have a
> Kconfig option, we should be able to find the set of syscalls
> that are unconditionally enabled but could be optional.
> If you have the chance, could you compile that list?
> I might also have a list, but probably not in the next week.
>
> The next step after that I think is to measure the impact
> of turning off those remaining ones in a configuration that
> has the existing symbols (e.g. sysvipc, futex, compat_32bit_time,
> ...) disabled already.
>
> Side note: I'm a  bit surprised to see fstat() in the list, since riscv
> should only really support newfstat().


The syscall list comes from a simple test environment rather than a
workload I intend to optimize for deployment.

The list I posted was generated using strace on RISC-V QEMU. I was
looking at the ABI names, not the actual kernel syscall names.  One
question here: for syscall trimming, should we discuss everything in
terms of syscall ABI names or the actual kernel syscall function names?
I would like to confirm your preference before I continue with the
updated list.

For now, I'll continue the discussion in terms of syscall ABI names.

Following your suggestion, I started by taking the syscall list required
for the Lighttpd workload and expanded it into the corresponding
functional groups.

Here is a very preliminary draft of the syscall grouping, based on the
systemd classification.

https://pastebin.com/raw/Yx92bb3m

Then, I wrote a small script that classifies each syscall from lighttpd
into its category and then enumerates all syscalls belonging to those
categories.

It addresses two of the items you asked for

- Identifying the syscall families related to my minimal Lighttpd
  workload

- Enumerating which syscalls appear in those categories and could
  potentially become optional

```
Categories present in lighttpd_syscalls.txt:
  @basic-io: 5 / 16
  @clock: 1 / 8
  @default: 9 / 30
  @file-system: 6 / 47
  @io-event: 3 / 7
  @ipc: 1 / 23
  @mount: 1 / 13
  @network-io: 8 / 18
  @signal: 2 / 14
Total unique categories: 9
Total categories defined: 30

Categories not present in lighttpd_syscalls.txt:
  @aio: 0 / 9
  @chown: 0 / 2
  @debug: 0 / 5
  @keyring: 0 / 3
  @memlock: 0 / 5
  @module: 0 / 3
  @pkey: 0 / 3
  @privileged: 0 / 15
  @process: 0 / 24
  @reboot: 0 / 3
  @resources: 0 / 14
  @sandbox: 0 / 4
  @setuid: 0 / 12
  @swap: 0 / 2
  @sync: 0 / 6
  @system-service: 0 / 24
  @timer: 0 / 11
  arch-specific: 0 / 1
  memory-isolation: 0 / 1
  memory-protection: 0 / 1
  security-lsm: 0 / 3

All syscalls in the appearing categories:
accept, accept4, adjtimex, bind, brk, cachestat, chdir, chroot, clock_adjtime, clock_getres, clock_gettime, clock_nanosleep, clock_settime, close, close_range, connect, copy_file_range, dup, dup3, epoll_create1, epoll_ctl, epoll_pwait, epoll_pwait2, eventfd2, execve, exit, exit_group, faccessat, faccessat2, fallocate, fchdir, fchmod, fchmodat, fchmodat2, fcntl, fgetxattr, flistxattr, fremovexattr, fsconfig, fsetxattr, fsmount, fsopen, fspick, fstat, fstatfs, ftruncate, futex, futex_requeue, futex_wait, futex_waitv, futex_wake, get_robust_list, getcwd, getdents64, getegid, geteuid, getgid, getpeername, getpid, getppid, getrandom, getsockname, getsockopt, gettid, gettimeofday, getuid, getxattr, inotify_add_watch, inotify_init1, inotify_rm_watch, ioctl, kill, lgetxattr, linkat, listen, listmount, listxattr, llistxattr, lremovexattr, lseek, lsetxattr, membarrier, memfd_create, mkdirat, mknodat, mmap, mount, mount_setattr, move_mount, mprotect, mq_getsetattr, mq_notify, mq_open,
mq_timedreceive, mq_timedsend, mq_unlink, mremap, msgctl, msgget, msgrcv, msgsnd, munmap, newfstatat, open_tree, openat, openat2, pidfd_send_signal, pipe2, pivot_root, ppoll, pread64, preadv, preadv2, prlimit64, process_madvise, process_vm_readv, process_vm_writev, pselect6, pwrite64, pwritev, pwritev2, read, readahead, readlinkat, readv, recvfrom, recvmmsg, recvmsg, removexattr, renameat2, restart_syscall, riscv_flush_icache, riscv_hwprobe, rseq, rt_sigaction, rt_sigpending, rt_sigprocmask, rt_sigqueueinfo, rt_sigreturn, rt_sigsuspend, rt_sigtimedwait, rt_tgsigqueueinfo, semctl, semget, semop, semtimedop, sendmmsg, sendmsg, sendto, set_robust_list, set_tid_address, setsockopt, settimeofday, setxattr, shmat, shmctl, shmdt, shmget, shutdown, sigaltstack, signalfd4, socket, socketpair, statfs, statmount, statx, symlinkat, tgkill, tkill, truncate, umask, umount2, unlinkat, utimensat, write, writev
Total syscalls in these categories: 176
```


This produces a list of 176 syscalls across 9 categories that are
relevant to the workload. The output also shows which categories do not
appear in this workload (21 categories with 0 syscalls used).

If the categorization works this way, it's actually quite surprising
that even such a simple workload would pull in as many as 176 syscalls.
I'm not sure yet what the actual trimming impact will look like after
building, but I will test that next.

>> Also, I noticed that there haven't been any comments so far on the later
>> patches introducing the PUSHSECTION macro.  I' m a bit concerned about how
>> people perceive this part.
> I don't have a strong opinion on this part.
>
>      Arnd


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ