lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fa8c0bca-a127-410e-9b13-3fa5ea7d8110@csgroup.eu>
Date: Fri, 13 Jun 2025 07:24:46 +0200
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: Takashi Iwai <tiwai@...e.de>
Cc: Jaroslav Kysela <perex@...ex.cz>, Takashi Iwai <tiwai@...e.com>,
 linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
 linux-sound@...r.kernel.org, Herve Codina <herve.codina@...tlin.com>,
 Mark Brown <broonie@...nel.org>
Subject: Re: [PATCH] ALSA: pcm: Convert snd_pcm_ioctl_sync_ptr_{compat/x32} to
 user_access_begin/user_access_end()



Le 12/06/2025 à 13:02, Takashi Iwai a écrit :
> On Thu, 12 Jun 2025 12:39:39 +0200,
> Christophe Leroy wrote:
>>
>> With user access protection (Called SMAP on x86 or KUAP on powerpc)
>> each and every call to get_user() or put_user() performs heavy
>> operations to unlock and lock kernel access to userspace.
>>
>> SNDRV_PCM_IOCTL_SYNC_PTR ioctl is a hot path that needs to be
>> optimised. To do that, perform user accesses by blocks using
>> user_access_begin/user_access_end() and unsafe_get_user()/
>> unsafe_put_user() and alike.
>>
>> Before the patch the 9 calls to put_user() at the
>> end of snd_pcm_ioctl_sync_ptr_compat() imply the following set of
>> instructions about 9 times (access_ok - enable user - write - disable
>> user):
>>      0.00 :   c057f858:       3d 20 7f ff     lis     r9,32767
>>      0.29 :   c057f85c:       39 5e 00 14     addi    r10,r30,20
>>      0.77 :   c057f860:       61 29 ff fc     ori     r9,r9,65532
>>      0.32 :   c057f864:       7c 0a 48 40     cmplw   r10,r9
>>      0.36 :   c057f868:       41 a1 fb 58     bgt     c057f3c0 <snd_pcm_ioctl+0xbb0>
>>      0.30 :   c057f86c:       3d 20 dc 00     lis     r9,-9216
>>      1.95 :   c057f870:       7d 3a c3 a6     mtspr   794,r9
>>      0.33 :   c057f874:       92 8a 00 00     stw     r20,0(r10)
>>      0.27 :   c057f878:       3d 20 de 00     lis     r9,-8704
>>      0.28 :   c057f87c:       7d 3a c3 a6     mtspr   794,r9
>> ...
>>
>> A perf profile shows that in total the 9 put_user() represent 36% of
>> the time spent in snd_pcm_ioctl() and about 80 instructions.
>>
>> With this patch everything is done in 13 instructions and represent
>> only 15% of the time spent in snd_pcm_ioctl():
>>
>>      0.57 :   c057f5dc:       3d 20 dc 00     lis     r9,-9216
>>      0.98 :   c057f5e0:       7d 3a c3 a6     mtspr   794,r9
>>      0.16 :   c057f5e4:       92 7f 00 04     stw     r19,4(r31)
>>      0.63 :   c057f5e8:       93 df 00 0c     stw     r30,12(r31)
>>      0.16 :   c057f5ec:       93 9f 00 10     stw     r28,16(r31)
>>      4.95 :   c057f5f0:       92 9f 00 14     stw     r20,20(r31)
>>      0.19 :   c057f5f4:       92 5f 00 18     stw     r18,24(r31)
>>      0.49 :   c057f5f8:       92 bf 00 1c     stw     r21,28(r31)
>>      0.27 :   c057f5fc:       93 7f 00 20     stw     r27,32(r31)
>>      5.88 :   c057f600:       93 36 00 00     stw     r25,0(r22)
>>      0.11 :   c057f604:       93 17 00 00     stw     r24,0(r23)
>>      0.00 :   c057f608:       3d 20 de 00     lis     r9,-8704
>>      0.79 :   c057f60c:       7d 3a c3 a6     mtspr   794,r9
>>
>> Note that here the access_ok() in user_write_access_begin() is skipped
>> because the exact same verification has already been performed at the
>> beginning of the fonction with the call to user_read_access_begin().
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@...roup.eu>
>> ---
>> This is a lighter version of previous patch "[PATCH v2] ALSA: pcm: Convert multiple {get/put}_user to user_access_begin/user_access_end()" focussing on identified hot path.
>> Moved and nested the failure labels closer in order to increase readability
> 
> Thanks for the revised patch!
> 
> Although it's now much lighter, I still believe that we can reduce
> get_user() / put_user() calls significantly by adjusting the struct
> usage.
> 
> Could you check whether the patch below can improve?
> (Zero-ing of sstatus can be an overhead here, but there are some
> holes, and these need to be initialized before copying back...)
> 

Thanks for the proposed patch. Unfortunately it doesn't improve the 
situation. The problem with copy_from_user() and copy_to_user() is that 
they perform quite bad on small regions. And for the from_user side we 
still get two user access enable/disable instead 3 and for the to_user 
side we still get two as well (Allthough we had 7 previously). Those 4 
user access enable/disable still have a cost.

Nowadays the tendency is really to go for the unsafe_put/get_user style, 
see some significant exemples below. And as mentioned in those commits, 
behind the performance improvement it also lead to much cleaner code 
generation.
- 38ebcf5096a8 ("scm: optimize put_cmsg()")
- 9f79b78ef744 ("Convert filldir[64]() from __put_user() to 
unsafe_put_user()")
- ef0ba0553829 ("poll: fix performance regression due to out-of-line 
__put_user()")

Commit 38ebcf5096a8 is explicit about copy_to_user() being bad for small 
regions.

Here below is some comparison between the three way of doing 
snd_pcm_ioctl_sync_ptr_compat(), output is from 'perf top':

Initially I got the following. That 12%+ is the reason why I started 
investigating.

     14.20%  test_perf           [.] engine_main
==> 12.86%  [kernel]            [k] snd_pcm_ioctl
     11.91%  [kernel]            [k] finish_task_switch.isra.0
      4.15%  [kernel]            [k] snd_pcm_group_unlock_irq.part.0
      4.07%  libc.so.6           [.] __ioctl_time64
      3.58%  libasound.so.2.0.0  [.] __snd_pcm_mmap_begin_generic
      3.37%  [kernel]            [k] sys_ioctl
      2.96%  libasound.so.2.0.0  [.] snd_pcm_hw_avail_update
      2.73%  libasound.so.2.0.0  [.] __snd_pcm_mmap_begin
      2.58%  [kernel]            [k] system_call_exception
      1.93%  libasound.so.2.0.0  [.] sync_ptr1
      1.85%  libasound.so.2.0.0  [.] snd_pcm_unlock
      1.84%  libasound.so.2.0.0  [.] snd_pcm_mmap_begin
      1.83%  libasound.so.2.0.0  [.] bad_pcm_state
      1.68%  libasound.so.2.0.0  [.] snd_pcm_mmap_avail
      1.67%  libasound.so.2.0.0  [.] snd_pcm_avail_update

With _your_ patch I get the following. copy_from_user() calls 
_copy_from_user() and copy_to_user() calls _copy_to_user(). Both then 
call __copy_tofrom_user(). In total it is 16.4% so it is worse than before.

     14.47%  test_perf           [.] engine_main
     12.00%  [kernel]            [k] finish_task_switch.isra.0
==>  8.37%  [kernel]            [k] snd_pcm_ioctl
      5.44%  libc.so.6           [.] __ioctl_time64
      5.03%  [kernel]            [k] snd_pcm_group_unlock_irq.part.0
==>  4.86%  [kernel]            [k] __copy_tofrom_user
      4.62%  [kernel]            [k] sys_ioctl
      3.22%  [kernel]            [k] system_call_exception
      2.42%  libasound.so.2.0.0  [.] snd_pcm_mmap_begin
      2.31%  [kernel]            [k] fdget
      2.23%  libasound.so.2.0.0  [.] __snd_pcm_mmap_begin_generic
      2.19%  [kernel]            [k] syscall_exit_prepare
      1.92%  libasound.so.2.0.0  [.] snd_pcm_mmap_avail
      1.86%  libasound.so.2.0.0  [.] __snd_pcm_mmap_begin
      1.68%  libasound.so.2.0.0  [.] snd_pcm_hw_avail_update
==>  1.67%  [kernel]            [k] _copy_from_user
      1.66%  libasound.so.2.0.0  [.] bad_pcm_state
==>  1.53%  [kernel]            [k] _copy_to_user
      1.40%  libasound.so.2.0.0  [.] sync_ptr1

With my patch I get the following:

     17.46%  test_perf           [.] engine_main
      9.14%  [kernel]            [k] finish_task_switch.isra.0
==>  4.92%  [kernel]            [k] snd_pcm_ioctl
      3.99%  [kernel]            [k] snd_pcm_group_unlock_irq.part.0
      3.71%  libc.so.6           [.] __ioctl_time64
      3.61%  libasound.so.2.0.0  [.] __snd_pcm_mmap_begin_generic
      2.72%  libasound.so.2.0.0  [.] sync_ptr1
      2.65%  [kernel]            [k] system_call_exception
      2.46%  [kernel]            [k] sys_ioctl
      2.43%  [kernel]            [k] __rseq_handle_notify_resume
      2.34%  [kernel]            [k] do_epoll_wait
      2.30%  libasound.so.2.0.0  [.] __snd_pcm_mmap_commit
      2.14%  libasound.so.2.0.0  [.] __snd_pcm_avail
      2.04%  libasound.so.2.0.0  [.] snd_pcm_hw_avail_update
      1.89%  libasound.so.2.0.0  [.] snd_pcm_lock
      1.84%  libasound.so.2.0.0  [.] snd_pcm_mmap_avail
      1.76%  libasound.so.2.0.0  [.] __snd_pcm_avail_update
      1.61%  libasound.so.2.0.0  [.] bad_pcm_state
      1.60%  libasound.so.2.0.0  [.] __snd_pcm_mmap_begin
      1.49%  libasound.so.2.0.0  [.] query_status_data

Christophe


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ