lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 20 Jun 2024 10:58:58 +0800
From: "Liao, Chang" <liaochang1@...wei.com>
To: Jiri Olsa <olsajiri@...il.com>
CC: <rostedt@...dmis.org>, <mhiramat@...nel.org>, <oleg@...hat.com>,
	<ast@...nel.org>, <daniel@...earbox.net>, <andrii@...nel.org>,
	<nathan@...nel.org>, <peterz@...radead.org>, <mingo@...hat.com>,
	<mark.rutland@....com>, <linux-perf-users@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <bpf@...r.kernel.org>
Subject: Re: [PATCH bpf-next] uprobes: Fix the xol slots reserved for
 uretprobe trampoline

Hi, Jiri

在 2024/6/20 0:22, Jiri Olsa 写道:
> On Wed, Jun 19, 2024 at 01:34:11AM +0000, Liao Chang wrote:
>> When the new uretprobe system call was added [1], the xol slots reserved
>> for the uretprobe trampoline might be insufficient on some architecture.
> 
> hum, uretprobe syscall is x86_64 specific, nothing was changed wrt slots
> or other architectures.. could you be more specific in what's changed?

I observed a significant performance degradation when using uprobe to trace Redis
on arm64 machine. redis-benchmark showed a decrease of around 7% with uprobes
attached to two hot functions, and a much worse result with uprobes on more hot
functions. Here is a samll snapshot of benchmark result.

No uprobe
---------
SET: 73686.54 rps
GET: 73702.83 rps

Uprobes on two hot functions
----------------------------
SET: 68441.59 rps, -7.1%
GET: 68951.25 rps, -6.4%

Uprobes at three hot functions
------------------------------
SET: 40953.39 rps,-44.4%
GET: 41609.45 rps,-43.5%

To investigate the potential improvements, i ported the uretprobe syscall and
trampoline feature for arm64. The trampoline code used on arm64 looks like this:

uretprobe_trampoline_for_arm64:
	str x8, [sp, #-8]!
	mov x8, __NR_uretprobe
	svc #0

Due to arm64 uses fixed-lenghth instruction of 4 bytes, the total size of the trampoline
code is 12 bytes, since the ixol slot size is typical 4 bytes, the misfit bewteen the
slot size of trampoline size requires more than one slot to reserve.

Thanks.

> 
> thanks,
> jirka
> 
>> For example, on arm64, the trampoline is consist of three instructions
>> at least. So it should mark enough bits in area->bitmaps and
>> and area->slot_count for the reserved slots.
>>
>> [1] https://lore.kernel.org/all/20240611112158.40795-4-jolsa@kernel.org/
>>
>> Signed-off-by: Liao Chang <liaochang1@...wei.com>
>> ---
>>  kernel/events/uprobes.c | 11 +++++++----
>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
>> index 2816e65729ac..efd2d7f56622 100644
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -1485,7 +1485,7 @@ void * __weak arch_uprobe_trampoline(unsigned long *psize)
>>  static struct xol_area *__create_xol_area(unsigned long vaddr)
>>  {
>>  	struct mm_struct *mm = current->mm;
>> -	unsigned long insns_size;
>> +	unsigned long insns_size, slot_nr;
>>  	struct xol_area *area;
>>  	void *insns;
>>  
>> @@ -1508,10 +1508,13 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
>>  
>>  	area->vaddr = vaddr;
>>  	init_waitqueue_head(&area->wq);
>> -	/* Reserve the 1st slot for get_trampoline_vaddr() */
>> -	set_bit(0, area->bitmap);
>> -	atomic_set(&area->slot_count, 1);
>>  	insns = arch_uprobe_trampoline(&insns_size);
>> +	/* Reserve enough slots for the uretprobe trampoline */
>> +	for (slot_nr = 0;
>> +	     slot_nr < max((insns_size / UPROBE_XOL_SLOT_BYTES), 1);
>> +	     slot_nr++)
>> +		set_bit(slot_nr, area->bitmap);
>> +	atomic_set(&area->slot_count, slot_nr);
>>  	arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size);
>>  
>>  	if (!xol_add_vma(mm, area))
>> -- 
>> 2.34.1
>>

-- 
BR
Liao, Chang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ