linux-kernel - Extending clone

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <aCs65ccRQtJBnZ_5@arm.com>
Date: Mon, 19 May 2025 15:06:29 +0100
From: Yury Khrustalev <yury.khrustalev@....com>
To: <linux-kernel@...r.kernel.org>
CC: Christian Brauner <brauner@...nel.org>, Arnd Bergmann <arnd@...db.de>,
	Mark Brown <broonie@...nel.org>, Mark Rutland <mark.rutland@....com>,
	<linux-api@...r.kernel.org>
Subject: Extending clone_args for clone3()

Hi,

I'm working on an RFC patch for Glibc to make use of the newly added
shadow_stack_token field in struct clone_args in [1] on arm64 targets.

I encountered the following problem. Glibc might be built with newer
version of struct clone_args than the currently running kernel. In
this case, we may attempt to use a non-zero value in the new field
in args (and pass size bigger than expected by the kernel) and the
kernel will reject the syscall with E2BIG error.

This seems to be due to a fail-early approach. The unexpected non-
zero values beyond what's supported by the kernel may indicate that
userspace expects something to happen (and may even have allocated
some resources). So it's better to indicate a problem rather than
silently ignore this and have userspace encounter an error later.

However, it creates difficulty with using extended "versions" of
the clone3 syscall. AFAIK, there is no way to ask kernel about
the supported size of struct clone_args except for making syscalls
with decreasing value of size until we stop getting E2BIG.

This seems fragile and may call for writing cumbersome code. In essence,
we will have to have clone30(), clone31(), clone32()... wrappers which
probably defeats the point of why clone3 was added:

  if (clone32_supported && clone32(...) == -1 && errno == E2BIG)
    {
      clone32_supported = false;
      /* ... */
    }
  else if (clone31_supported && clone31(...) == -1 && errno == E2BIG)
    {
      clone12_supported = false;
      /* ... */
    }
 ...

Is there a neat way to work around this? What was the idea for extending
clone_args in practice?

I suppose we can't rely on kernel version because support for extended
clone_args can be backported. In any case, we'd have to do a syscall
for this (it would probably be great to have kernel version in auxv).

I appreciate any advice here.

Thanks,
Yury

[1]: https://lore.kernel.org/all/20250416-clone3-shadow-stack-v16-0-2ffc9ca3917b@kernel.org/