[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67874b84.7b0a0220.3935f4.1f48@mx.google.com>
Date: Wed, 15 Jan 2025 07:45:38 +0200
From: Shmulik Ladkani <shmulik.ladkani@...il.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Eyal Birger <eyal.birger@...il.com>, Andrii Nakryiko
<andrii.nakryiko@...il.com>, Jiri Olsa <olsajiri@...il.com>, Sarai Aleksa
<cyphar@...har.com>, mhiramat@...nel.org, linux-kernel
<linux-kernel@...r.kernel.org>, linux-trace-kernel@...r.kernel.org,
BPF-dev-list <bpf@...r.kernel.org>, Song Liu <songliubraving@...com>,
Yonghong Song <yhs@...com>, John Fastabend <john.fastabend@...il.com>,
peterz@...radead.org, tglx@...utronix.de, bp@...en8.de, x86@...nel.org,
linux-api@...r.kernel.org, Andrii Nakryiko <andrii@...nel.org>, Daniel
Borkmann <daniel@...earbox.net>, Alexei Starovoitov <ast@...nel.org>,
rostedt@...dmis.org, rafi@....io
Subject: Re: Crash when attaching uretprobes to processes running in Docker
On Wed, 15 Jan 2025 01:50:13 +0100 Oleg Nesterov <oleg@...hat.com>
wrote:
> On 01/14, Eyal Birger wrote:
> >
> > Its software, that’s working fine in previous kernel versions and
> > upon upgrade starts creating crashes in other processes.
> >
> > IMHO demanding that other software (e.g docker) be upgraded in
> > order to run on a newer kernel is not what Linux formerly
> > guaranteed.
>
> Agreed.
IMO There are 2 problematic aspects with ff474a78cef5
("uprobe: Add uretprobe syscall to speed up return probe").
The first, as Eyal mentioned, is the kernel regression: There are
endless systems out there (iaas and paas) that have both
telementry/instrumentation/tracing software (utilizing uprobes) and
container environments (duch as docker) that enforce syscall
restrictions on their workloads.
These systems worked so far, and with kernels having ff474a78cef5 the
workloads processes fault.
The second, is the fact that ff474a78cef5 (which adds a new syscall
invocation to the uretprobe trampoline) *exposes an internal kernel
implementation* to the userspace system:
There are millions of binaries/libraries out there that *never issue*
the new syscall: they simply do not have that call in their
instructions. Take for example hello-world.
However, once hello-world is traced (with software utilizing
uprobes) hello-world *unknowingly* DO issue the new syscall, just
because the kernel decided to implement its uretprobe trampoline using
a new syscall - a mechanism that should be completely transparent and
seamless to the user program.
This is totally unexpected, and to ask a system admin to "guess" whether
hello-world is "going to issue the syscall despite the fact that
such invocation does not exist in its own code at all" (and set seccomp
permissions accordingly) is asking for the admin to know the exact
*internal mechanisms* that the kernel use for implemeting the
trampolines.
Just like we won't add a div-by-zero fault to the trampoline, we
shoudn't add any instruction (such as a syscall) that isn't *completely
transparent* to the userspace program.
Best,
Shmulik
Powered by blists - more mailing lists