lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250114190521.0b69a1af64cac41106101154@kernel.org>
Date: Tue, 14 Jan 2025 19:05:21 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Jiri Olsa <olsajiri@...il.com>
Cc: oleg@...hat.com, Aleksa Sarai <cyphar@...har.com>, Eyal Birger
 <eyal.birger@...il.com>, mhiramat@...nel.org, linux-kernel
 <linux-kernel@...r.kernel.org>, linux-trace-kernel@...r.kernel.org,
 BPF-dev-list <bpf@...r.kernel.org>, Song Liu <songliubraving@...com>,
 Yonghong Song <yhs@...com>, John Fastabend <john.fastabend@...il.com>,
 peterz@...radead.org, tglx@...utronix.de, bp@...en8.de, x86@...nel.org,
 linux-api@...r.kernel.org, Andrii Nakryiko <andrii@...nel.org>, Daniel
 Borkmann <daniel@...earbox.net>, Alexei Starovoitov <ast@...nel.org>,
 Andrii Nakryiko <andrii.nakryiko@...il.com>, "rostedt@...dmis.org"
 <rostedt@...dmis.org>, rafi@....io, Shmulik Ladkani
 <shmulik.ladkani@...il.com>
Subject: Re: Crash when attaching uretprobes to processes running in Docker

On Tue, 14 Jan 2025 10:22:20 +0100
Jiri Olsa <olsajiri@...il.com> wrote:

> On Sat, Jan 11, 2025 at 07:40:15PM +0100, Jiri Olsa wrote:
> > On Sat, Jan 11, 2025 at 02:25:37AM +1100, Aleksa Sarai wrote:
> > > On 2025-01-10, Eyal Birger <eyal.birger@...il.com> wrote:
> > > > Hi,
> > > > 
> > > > When attaching uretprobes to processes running inside docker, the attached
> > > > process is segfaulted when encountering the retprobe. The offending commit
> > > > is:
> > > > 
> > > > ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe")
> > > > 
> > > > To my understanding, the reason is that now that uretprobe is a system call,
> > > > the default seccomp filters in docker block it as they only allow a specific
> > > > set of known syscalls.
> > > 
> > > FWIW, the default seccomp profile of Docker _should_ return -ENOSYS for
> > > uretprobe (runc has a bunch of ugly logic to try to guarantee this if
> > > Docker hasn't updated their profile to include it). Though I guess that
> > > isn't sufficient for the magic that uretprobe(2) does...
> > > 
> > > > This behavior can be reproduced by the below bash script, which works before
> > > > this commit.
> > > > 
> > > > Reported-by: Rafael Buchbinder <rafi@....io>
> > 
> > hi,
> > nice ;-) thanks for the report, the problem seems to be that uretprobe syscall
> > is blocked and uretprobe trampoline does not expect that
> > 
> > I think we could add code to the uretprobe trampoline to detect this and
> > execute standard int3 as fallback to process uretprobe, I'm checking on that
> 
> hack below seems to fix the issue, it's using rbx to signal that uretprobe
> syscall got executed, if not, trampoline does int3 and executes uretprobe
> handler in the old way
> 
> unfortunately now the uretprobe trampoline size crosses the xol slot limit so
> will need to come up with some generic/arch code solution for that, code below
> is neglecting that for now
> 
> jirka
> 
> 
> ---
>  arch/x86/kernel/uprobes.c | 24 ++++++++++++++++++++++++
>  include/linux/uprobes.h   |  1 +
>  kernel/events/uprobes.c   | 10 ++++++++--
>  3 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 5a952c5ea66b..b54863f6fa25 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -315,14 +315,25 @@ asm (
>  	".global uretprobe_trampoline_entry\n"
>  	"uretprobe_trampoline_entry:\n"
>  	"pushq %rax\n"
> +	"pushq %rbx\n"
>  	"pushq %rcx\n"
>  	"pushq %r11\n"
> +	"movq $1, %rbx\n"
>  	"movq $" __stringify(__NR_uretprobe) ", %rax\n"
>  	"syscall\n"
>  	".global uretprobe_syscall_check\n"
>  	"uretprobe_syscall_check:\n"
> +	"or %rbx,%rbx\n"
> +	"jz uretprobe_syscall_return\n"
>  	"popq %r11\n"
>  	"popq %rcx\n"
> +	"popq %rbx\n"
> +	"popq %rax\n"
> +	"int3\n"
> +	"uretprobe_syscall_return:\n"
> +	"popq %r11\n"
> +	"popq %rcx\n"
> +	"popq %rbx\n"
>  
>  	/* The uretprobe syscall replaces stored %rax value with final
>  	 * return address, so we don't restore %rax in here and just
> @@ -338,6 +349,16 @@ extern u8 uretprobe_trampoline_entry[];
>  extern u8 uretprobe_trampoline_end[];
>  extern u8 uretprobe_syscall_check[];
>  
> +#define UINSNS_PER_PAGE                 (PAGE_SIZE/UPROBE_XOL_SLOT_BYTES)
> +
> +bool arch_is_uretprobe_trampoline(unsigned long vaddr)
> +{
> +	unsigned long start = uprobe_get_trampoline_vaddr();
> +	unsigned long end = start + 2*UINSNS_PER_PAGE;
> +
> +	return vaddr >= start && vaddr < end;
> +}
> +
>  void *arch_uprobe_trampoline(unsigned long *psize)
>  {
>  	static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
> @@ -418,6 +439,9 @@ SYSCALL_DEFINE0(uretprobe)
>  	regs->r11 = regs->flags;
>  	regs->cx  = regs->ip;
>  
> +	/* zero rbx to signal trampoline that uretprobe syscall was executed */
> +	regs->bx  = 0;

Can we just return -ENOSYS as like as other syscall instead of
using rbx as a side channel?
We can carefully check the return address is not -ERRNO when set up
and reserve the -ENOSYS for this use case.

Thank you,

> +
>  	return regs->ax;
>  
>  sigill:
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index e0a4c2082245..dbde57a68a1b 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -213,6 +213,7 @@ extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
>  extern void uprobe_handle_trampoline(struct pt_regs *regs);
>  extern void *arch_uprobe_trampoline(unsigned long *psize);
>  extern unsigned long uprobe_get_trampoline_vaddr(void);
> +bool arch_is_uretprobe_trampoline(unsigned long vaddr);
>  #else /* !CONFIG_UPROBES */
>  struct uprobes_state {
>  };
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index fa04b14a7d72..73df64109f38 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1703,6 +1703,11 @@ void * __weak arch_uprobe_trampoline(unsigned long *psize)
>  	return &insn;
>  }
>  
> +bool __weak arch_is_uretprobe_trampoline(unsigned long vaddr)
> +{
> +	return vaddr == uprobe_get_trampoline_vaddr();
> +}
> +
>  static struct xol_area *__create_xol_area(unsigned long vaddr)
>  {
>  	struct mm_struct *mm = current->mm;
> @@ -1725,8 +1730,9 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
>  
>  	area->vaddr = vaddr;
>  	init_waitqueue_head(&area->wq);
> -	/* Reserve the 1st slot for get_trampoline_vaddr() */
> +	/* Reserve the first two slots for get_trampoline_vaddr() */
>  	set_bit(0, area->bitmap);
> +	set_bit(1, area->bitmap);
>  	insns = arch_uprobe_trampoline(&insns_size);
>  	arch_uprobe_copy_ixol(area->page, 0, insns, insns_size);
>  
> @@ -2536,7 +2542,7 @@ static void handle_swbp(struct pt_regs *regs)
>  	int is_swbp;
>  
>  	bp_vaddr = uprobe_get_swbp_addr(regs);
> -	if (bp_vaddr == uprobe_get_trampoline_vaddr())
> +	if (arch_is_uretprobe_trampoline(bp_vaddr))
>  		return uprobe_handle_trampoline(regs);
>  
>  	rcu_read_lock_trace();
> -- 
> 2.47.1
> 


-- 
Masami Hiramatsu (Google) <mhiramat@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ