linux-kernel - Chromium sandbox on LoongArch and statx -- seccomp deep argument inspection again?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <599df4a3-47a4-49be-9c81-8e21ea1f988a@xen0n.name>
Date: Wed, 21 Feb 2024 14:09:59 +0800
From: WANG Xuerui <kernel@...0n.name>
To: linux-api@...r.kernel.org
Cc: Arnd Bergmann <arnd@...db.de>, Christian Brauner <brauner@...nel.org>,
 Kees Cook <keescook@...omium.org>, Huacai Chen <chenhuacai@...nel.org>,
 Xuefeng Li <lixuefeng@...ngson.cn>, Jianmin Lv <lvjianmin@...ngson.cn>,
 Xiaotian Wu <wuxiaotian@...ngson.cn>, WANG Rui <wangrui@...ngson.cn>,
 Miao Wang <shankerwangmiao@...il.com>, Icenowy Zheng <uwu@...nowy.me>,
 "loongarch@...ts.linux.dev" <loongarch@...ts.linux.dev>,
 linux-arch <linux-arch@...r.kernel.org>,
 Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Chromium sandbox on LoongArch and statx -- seccomp deep argument
 inspection again?

Hi,

Recently, we -- community LoongArch porters -- have noticed a problem 
where the Chromium sandbox apparently wants to deny statx [^1] so it 
could properly inspect arguments after the sandboxed process later falls 
back to fstat. The reasoning behind the change was not clear in the 
patch; but we found out it's basically because there's currently not a 
"fd-only" version of statx, so that the sandbox has no way to ensure the 
path argument is empty without being able to peek into the sandboxed 
process's memory. For architectures able to do newfstatat though, the 
glibc falls back to newfstatat after getting -ENOSYS for statx, then the 
respective SIGSYS handler [^2] takes care of inspecting the path 
argument, transforming allowed newfstatat's into fstat instead which is 
allowed and has the same type of return value.

But, as loongarch is the first architecture to not have fstat nor 
newfstatat, the LoongArch glibc does not attempt falling back at all 
when it gets -ENOSYS for statx -- and you see the problem there!

Actually, back when the loongarch port was under review, people were 
aware of the same problem with sandboxing clone3 [^3], so clone was 
eventually kept. Unfortunately it seemed at that time no one had noticed 
statx, so besides restoring fstat/newfstatat to loongarch uapi (and 
postponing the problem further), it seems inevitable that we would need 
to tackle seccomp deep argument inspection; this is obviously a decision 
that shouldn't be taken lightly, so I'm posting this to restart the 
conversation to figure out a way forward together. We basically could do 
one of below:

- just restore fstat and be done with it;
- add a flag to statx so we can do the equivalent of just fstat(fd, 
&out) with statx, and ensuring an error happens if path is not empty in 
that case;
- tackle the long-standing problem of seccomp deep argument inspection (!).

Obviously, the simplest solution would be to "just restore fstat". But 
again, in my opinion this is not quite a solution but a workaround -- we 
have good reasons to keep just statx (mainly because its feature set is 
a strict superset of those of fstat/newfstatat), and we're not quite in 
a hurry to resolve this. Because one of the prerequisites for a new 
Chromium port is "inclusion in Debian stable" -- as the loong64 port 
[^4] is progressing and the next Debian release would not happen until 
2025, we still have time for a more "elegant" solution.

Alternatively, we could also introduce a new flag for statx, maybe named 
AT_STATX_NO_PATH or something like that, so that statx would ignore the 
path altogether or error on non-empty paths if called with flags 
containing AT_EMPTY_PATH | AT_STATX_NO_PATH. This way a seccomp policy 
could allow statx calls only with such flags (that are passed via 
register and accessible) and maintain the same level of safety as with 
fstat. But there is also disadvantage to this approach: the flag would 
be useful only for sandboxes, because in other cases there's no need to 
avoid reading from &path. This is also more of a workaround to avoid the 
deep argument inspection problem.

Lastly, should we decide to go the hardest way, according to a previous 
mail [^5] (about clone3) and the LPC 2019 discussion [^6] [^7], we 
probably would try the metadata-annotation-based and piece-by-piece 
approach, as it's expected to provide the most benefit and involve less 
code changes. The implementation, as I surmise, will involve modifying 
the generic syscall entrypoint for early copying of user data, and 
corresponding changes to seccomp plumbing so this information is 
properly exposed. I don't have a roadmap for non-generic-entry arches 
right now, and I also haven't started designing the new seccomp ABI for 
that. As a matter of fact, members of the LoongArch community (myself 
included) are still fresh to this area of expertise, so a bit more of 
your feedback will be appreciated.

Thanks to Miao Wang from AOSC for doing much of the investigation.

[^1]: https://chromium-review.googlesource.com/c/chromium/src/+/2823150
[^2]: 
https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux/seccomp-bpf-helpers/sigsys_handlers.cc#355
[^3]: 
https://lore.kernel.org/linux-arch/20220511211231.GG7074@brightrain.aerifal.cx/
[^4]: https://wiki.debian.org/Ports/loong64
[^5]: https://lwn.net/ml/linux-kernel/201905301122.88FD40B3@keescook/
[^6]: https://lwn.net/Articles/799557/
[^7]: 
https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-inspection.pdf

-- 
WANG "xen0n" Xuerui

Linux/LoongArch mailing list:https://lore.kernel.org/loongarch/