[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANaxB-zwjDu5PSFJebeJe5zH94HC7mThOwyPYSjE4tkQ0zwvBA@mail.gmail.com>
Date: Wed, 27 Jan 2021 00:10:30 -0800
From: Andrei Vagin <avagin@...il.com>
To: Will Deacon <will@...nel.org>,
Catalin Marinas <catalin.marinas@....com>,
keno@...iacomputing.com, dave.martin@....com
Cc: Oleg Nesterov <oleg@...hat.com>,
linux-arm-kernel@...ts.infradead.org,
LKML <linux-kernel@...r.kernel.org>,
Andrei Vagin <avagin@...gle.com>,
Howard Zhang <howard.zhang@....com>,
Anthony Steinhauser <asteinhauser@...gle.com>
Subject: Re: [PATCH 0/3] arm64/ptrace: allow to get all registers on syscall traps
On Tue, Jan 19, 2021 at 2:08 PM Andrei Vagin <avagin@...il.com> wrote:
>
> Right now, ip/r12 for AArch32 and x7 for AArch64 is used to indicate
> whether or not the stop has been signalled from syscall entry or syscall
> exit. This means that:
>
> - Any writes by the tracer to this register during the stop are
> ignored/discarded.
>
> - The actual value of the register is not available during the stop,
> so the tracer cannot save it and restore it later.
>
> This series introduces NT_ARM_PRSTATUS to get all registers and makes it
> possible to change ip/r12 and x7 registers when tracee is stopped in
> syscall traps.
>
> For applications like the user-mode Linux or gVisor, it is critical to
> have access to the full set of registers at any moment. For example,
> they need to change values of all registers to emulate rt_sigreturn and
> they need to have the full set of registers to build a signal frame.
I have found the thread [1] where Keno, Will, and Dave discussed the same
problem. If I understand this right, the problem was not fixed, because there
were no users who needed it.
gVisor is a general-purpose sandbox to run untrusted workloads. It has a
platform interface that is responsible for syscall interception, context
switching, and managing process address spaces. Right now, we have kvm and
ptrace platforms. The ptrace platform runs a guest code in the context of stub
processes and intercepts syscalls with help of PTRACE_SYSEMU. All system calls
are handled by the gVisor kernel including rt_sigreturn and execve. Signal
handling is happing inside the gVisor kernel too. Each stub process can have
more than one thread, but we don't bind guest threads to stub threads and we
can run more than one guest thread in the context of one stub thread. Taking
into account all these facts, we need to have access to all registers at any
moment when a stub thread has been stopped.
We were able to introduce the workaround [3] for this issue. Each time when a
stub process is stopped on a system call, we queue a fake signal and resume a
process to stop it on the signal. It works, but we need to do extra interaction
with a stub process what is expensive. My benchmarks show that this workaround
slows down syscalls in gVisor for more than 50%. BTW: it is one of the major
reasons why PTRACE_SYSEMU was introduced instead of emulating it via
two calls of PTRACE_SYSCALL.
[1] https://lore.kernel.org/lkml/CABV8kRz0mKSc=u1LeonQSLroKJLOKWOWktCoGji2nvEBc=e7=w@mail.gmail.com/#r
[2] https://github.com/google/gvisor/issues/5238
[3] https://github.com/google/gvisor/commit/a44efaab6d4b815880749a39647fb3ed9634a489
>
> Andrei Vagin (3):
> arm64/ptrace: don't clobber task registers on syscall entry/exit traps
> arm64/ptrace: introduce NT_ARM_PRSTATUS to get a full set of registers
> selftest/arm64/ptrace: add a test for NT_ARM_PRSTATUS
>
> arch/arm64/include/asm/ptrace.h | 5 +
> arch/arm64/kernel/ptrace.c | 130 +++++++++++-----
> include/uapi/linux/elf.h | 1 +
> tools/testing/selftests/arm64/Makefile | 2 +-
> tools/testing/selftests/arm64/ptrace/Makefile | 6 +
> .../arm64/ptrace/ptrace_syscall_regs_test.c | 142 ++++++++++++++++++
> 6 files changed, 246 insertions(+), 40 deletions(-)
> create mode 100644 tools/testing/selftests/arm64/ptrace/Makefile
> create mode 100644 tools/testing/selftests/arm64/ptrace/ptrace_syscall_regs_test.c
>
> --
> 2.29.2
>
Powered by blists - more mailing lists