[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87y0xf3h3e.ffs@tglx>
Date: Sat, 08 Mar 2025 12:19:17 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Dmitry Vyukov <dvyukov@...gle.com>, krisman@...labora.com,
luto@...nel.org, peterz@...radead.org, keescook@...omium.org,
gregory.price@...verge.com
Cc: Dmitry Vyukov <dvyukov@...gle.com>, Marco Elver <elver@...gle.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/3] syscall_user_dispatch: Allow allowed range
wrap-around
On Mon, Feb 24 2025 at 09:45, Dmitry Vyukov wrote:
> There are two possible scenarios for syscall filtering:
> - having a trusted/allowed range of PCs, and intercepting everything else
> - or the opposite: a single untrusted/intercepted range and allowing
> everything else
> The current implementation only allows the former use case due to
> allowed range wrap-around check. Allow the latter use case as well
> by removing the wrap-around check.
> The latter use case is relevant for any kind of sandboxing scenario,
> or monitoring behavior of a single library. If a program wants to
> intercept syscalls for PC range [START, END) then it needs to call:
> prctl(..., END, -(END-START), ...);
> which sets a wrap-around range that excludes everything
> besides [START, END).
That's not really intuitive and the implementation changes the prctl()
behaviour in a non backwards compatible way.
Can we please keep the current behaviour and have a new mode. Something
like:
# define PR_SYS_DISPATCH_OFF 0
# define PR_SYS_DISPATCH_ON 1
# define PR_SYS_DISPATCH_EXCLUSIVE_ON PR_SYS_DISPATCH_ON
# define PR_SYS_DISPATCH_INCLUSIVE_ON 2
That keeps the current mode backwards compatible and avoids the oddity of
prctl(..., END, -(END-START), ...);
i.e. this is clearly and obvious distinguishable for user space:
prctl(..., PR_SYS_DISPATCH_EXCLUSIVE_ON, END, END - START, ...);
prctl(..., PR_SYS_DISPATCH_INCLUSIVE_ON, END, END - START, ...);
Which makes a lot of sense because these two modes are distinctly
different, no?
PR_SYS_DISPATCH_INCLUSIVE_ON will fail on older kernels and both modes
have a sanity check. PR_SYS_DISPATCH_INCLUSIVE_ON should at least check
for a zero length dispatcher region.
Aside of the better user interface this avoids the in_compat_syscall()
hack. Because then set_syscall_user_dispatch() does the range inversion
and that works completely independent of compat.
> kernel/entry/syscall_user_dispatch.c | 9 +++------
> kernel/sys.c | 6 ++++++
> 2 files changed, 9 insertions(+), 6 deletions(-)
This clearly lacks an update of
Documentation/admin-guide/syscall-user-dispatch.rst
Thanks,
tglx
Powered by blists - more mailing lists