[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <599df4a3-47a4-49be-9c81-8e21ea1f988a@xen0n.name>
Date: Wed, 21 Feb 2024 14:09:59 +0800
From: WANG Xuerui <kernel@...0n.name>
To: linux-api@...r.kernel.org
Cc: Arnd Bergmann <arnd@...db.de>, Christian Brauner <brauner@...nel.org>,
Kees Cook <keescook@...omium.org>, Huacai Chen <chenhuacai@...nel.org>,
Xuefeng Li <lixuefeng@...ngson.cn>, Jianmin Lv <lvjianmin@...ngson.cn>,
Xiaotian Wu <wuxiaotian@...ngson.cn>, WANG Rui <wangrui@...ngson.cn>,
Miao Wang <shankerwangmiao@...il.com>, Icenowy Zheng <uwu@...nowy.me>,
"loongarch@...ts.linux.dev" <loongarch@...ts.linux.dev>,
linux-arch <linux-arch@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Chromium sandbox on LoongArch and statx -- seccomp deep argument
inspection again?
Hi,
Recently, we -- community LoongArch porters -- have noticed a problem
where the Chromium sandbox apparently wants to deny statx [^1] so it
could properly inspect arguments after the sandboxed process later falls
back to fstat. The reasoning behind the change was not clear in the
patch; but we found out it's basically because there's currently not a
"fd-only" version of statx, so that the sandbox has no way to ensure the
path argument is empty without being able to peek into the sandboxed
process's memory. For architectures able to do newfstatat though, the
glibc falls back to newfstatat after getting -ENOSYS for statx, then the
respective SIGSYS handler [^2] takes care of inspecting the path
argument, transforming allowed newfstatat's into fstat instead which is
allowed and has the same type of return value.
But, as loongarch is the first architecture to not have fstat nor
newfstatat, the LoongArch glibc does not attempt falling back at all
when it gets -ENOSYS for statx -- and you see the problem there!
Actually, back when the loongarch port was under review, people were
aware of the same problem with sandboxing clone3 [^3], so clone was
eventually kept. Unfortunately it seemed at that time no one had noticed
statx, so besides restoring fstat/newfstatat to loongarch uapi (and
postponing the problem further), it seems inevitable that we would need
to tackle seccomp deep argument inspection; this is obviously a decision
that shouldn't be taken lightly, so I'm posting this to restart the
conversation to figure out a way forward together. We basically could do
one of below:
- just restore fstat and be done with it;
- add a flag to statx so we can do the equivalent of just fstat(fd,
&out) with statx, and ensuring an error happens if path is not empty in
that case;
- tackle the long-standing problem of seccomp deep argument inspection (!).
Obviously, the simplest solution would be to "just restore fstat". But
again, in my opinion this is not quite a solution but a workaround -- we
have good reasons to keep just statx (mainly because its feature set is
a strict superset of those of fstat/newfstatat), and we're not quite in
a hurry to resolve this. Because one of the prerequisites for a new
Chromium port is "inclusion in Debian stable" -- as the loong64 port
[^4] is progressing and the next Debian release would not happen until
2025, we still have time for a more "elegant" solution.
Alternatively, we could also introduce a new flag for statx, maybe named
AT_STATX_NO_PATH or something like that, so that statx would ignore the
path altogether or error on non-empty paths if called with flags
containing AT_EMPTY_PATH | AT_STATX_NO_PATH. This way a seccomp policy
could allow statx calls only with such flags (that are passed via
register and accessible) and maintain the same level of safety as with
fstat. But there is also disadvantage to this approach: the flag would
be useful only for sandboxes, because in other cases there's no need to
avoid reading from &path. This is also more of a workaround to avoid the
deep argument inspection problem.
Lastly, should we decide to go the hardest way, according to a previous
mail [^5] (about clone3) and the LPC 2019 discussion [^6] [^7], we
probably would try the metadata-annotation-based and piece-by-piece
approach, as it's expected to provide the most benefit and involve less
code changes. The implementation, as I surmise, will involve modifying
the generic syscall entrypoint for early copying of user data, and
corresponding changes to seccomp plumbing so this information is
properly exposed. I don't have a roadmap for non-generic-entry arches
right now, and I also haven't started designing the new seccomp ABI for
that. As a matter of fact, members of the LoongArch community (myself
included) are still fresh to this area of expertise, so a bit more of
your feedback will be appreciated.
Thanks to Miao Wang from AOSC for doing much of the investigation.
[^1]: https://chromium-review.googlesource.com/c/chromium/src/+/2823150
[^2]:
https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux/seccomp-bpf-helpers/sigsys_handlers.cc#355
[^3]:
https://lore.kernel.org/linux-arch/20220511211231.GG7074@brightrain.aerifal.cx/
[^4]: https://wiki.debian.org/Ports/loong64
[^5]: https://lwn.net/ml/linux-kernel/201905301122.88FD40B3@keescook/
[^6]: https://lwn.net/Articles/799557/
[^7]:
https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-inspection.pdf
--
WANG "xen0n" Xuerui
Linux/LoongArch mailing list:https://lore.kernel.org/loongarch/
Powered by blists - more mailing lists