[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fa8c0bca-a127-410e-9b13-3fa5ea7d8110@csgroup.eu>
Date: Fri, 13 Jun 2025 07:24:46 +0200
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: Takashi Iwai <tiwai@...e.de>
Cc: Jaroslav Kysela <perex@...ex.cz>, Takashi Iwai <tiwai@...e.com>,
linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
linux-sound@...r.kernel.org, Herve Codina <herve.codina@...tlin.com>,
Mark Brown <broonie@...nel.org>
Subject: Re: [PATCH] ALSA: pcm: Convert snd_pcm_ioctl_sync_ptr_{compat/x32} to
user_access_begin/user_access_end()
Le 12/06/2025 à 13:02, Takashi Iwai a écrit :
> On Thu, 12 Jun 2025 12:39:39 +0200,
> Christophe Leroy wrote:
>>
>> With user access protection (Called SMAP on x86 or KUAP on powerpc)
>> each and every call to get_user() or put_user() performs heavy
>> operations to unlock and lock kernel access to userspace.
>>
>> SNDRV_PCM_IOCTL_SYNC_PTR ioctl is a hot path that needs to be
>> optimised. To do that, perform user accesses by blocks using
>> user_access_begin/user_access_end() and unsafe_get_user()/
>> unsafe_put_user() and alike.
>>
>> Before the patch the 9 calls to put_user() at the
>> end of snd_pcm_ioctl_sync_ptr_compat() imply the following set of
>> instructions about 9 times (access_ok - enable user - write - disable
>> user):
>> 0.00 : c057f858: 3d 20 7f ff lis r9,32767
>> 0.29 : c057f85c: 39 5e 00 14 addi r10,r30,20
>> 0.77 : c057f860: 61 29 ff fc ori r9,r9,65532
>> 0.32 : c057f864: 7c 0a 48 40 cmplw r10,r9
>> 0.36 : c057f868: 41 a1 fb 58 bgt c057f3c0 <snd_pcm_ioctl+0xbb0>
>> 0.30 : c057f86c: 3d 20 dc 00 lis r9,-9216
>> 1.95 : c057f870: 7d 3a c3 a6 mtspr 794,r9
>> 0.33 : c057f874: 92 8a 00 00 stw r20,0(r10)
>> 0.27 : c057f878: 3d 20 de 00 lis r9,-8704
>> 0.28 : c057f87c: 7d 3a c3 a6 mtspr 794,r9
>> ...
>>
>> A perf profile shows that in total the 9 put_user() represent 36% of
>> the time spent in snd_pcm_ioctl() and about 80 instructions.
>>
>> With this patch everything is done in 13 instructions and represent
>> only 15% of the time spent in snd_pcm_ioctl():
>>
>> 0.57 : c057f5dc: 3d 20 dc 00 lis r9,-9216
>> 0.98 : c057f5e0: 7d 3a c3 a6 mtspr 794,r9
>> 0.16 : c057f5e4: 92 7f 00 04 stw r19,4(r31)
>> 0.63 : c057f5e8: 93 df 00 0c stw r30,12(r31)
>> 0.16 : c057f5ec: 93 9f 00 10 stw r28,16(r31)
>> 4.95 : c057f5f0: 92 9f 00 14 stw r20,20(r31)
>> 0.19 : c057f5f4: 92 5f 00 18 stw r18,24(r31)
>> 0.49 : c057f5f8: 92 bf 00 1c stw r21,28(r31)
>> 0.27 : c057f5fc: 93 7f 00 20 stw r27,32(r31)
>> 5.88 : c057f600: 93 36 00 00 stw r25,0(r22)
>> 0.11 : c057f604: 93 17 00 00 stw r24,0(r23)
>> 0.00 : c057f608: 3d 20 de 00 lis r9,-8704
>> 0.79 : c057f60c: 7d 3a c3 a6 mtspr 794,r9
>>
>> Note that here the access_ok() in user_write_access_begin() is skipped
>> because the exact same verification has already been performed at the
>> beginning of the fonction with the call to user_read_access_begin().
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@...roup.eu>
>> ---
>> This is a lighter version of previous patch "[PATCH v2] ALSA: pcm: Convert multiple {get/put}_user to user_access_begin/user_access_end()" focussing on identified hot path.
>> Moved and nested the failure labels closer in order to increase readability
>
> Thanks for the revised patch!
>
> Although it's now much lighter, I still believe that we can reduce
> get_user() / put_user() calls significantly by adjusting the struct
> usage.
>
> Could you check whether the patch below can improve?
> (Zero-ing of sstatus can be an overhead here, but there are some
> holes, and these need to be initialized before copying back...)
>
Thanks for the proposed patch. Unfortunately it doesn't improve the
situation. The problem with copy_from_user() and copy_to_user() is that
they perform quite bad on small regions. And for the from_user side we
still get two user access enable/disable instead 3 and for the to_user
side we still get two as well (Allthough we had 7 previously). Those 4
user access enable/disable still have a cost.
Nowadays the tendency is really to go for the unsafe_put/get_user style,
see some significant exemples below. And as mentioned in those commits,
behind the performance improvement it also lead to much cleaner code
generation.
- 38ebcf5096a8 ("scm: optimize put_cmsg()")
- 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()")
- ef0ba0553829 ("poll: fix performance regression due to out-of-line
__put_user()")
Commit 38ebcf5096a8 is explicit about copy_to_user() being bad for small
regions.
Here below is some comparison between the three way of doing
snd_pcm_ioctl_sync_ptr_compat(), output is from 'perf top':
Initially I got the following. That 12%+ is the reason why I started
investigating.
14.20% test_perf [.] engine_main
==> 12.86% [kernel] [k] snd_pcm_ioctl
11.91% [kernel] [k] finish_task_switch.isra.0
4.15% [kernel] [k] snd_pcm_group_unlock_irq.part.0
4.07% libc.so.6 [.] __ioctl_time64
3.58% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin_generic
3.37% [kernel] [k] sys_ioctl
2.96% libasound.so.2.0.0 [.] snd_pcm_hw_avail_update
2.73% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin
2.58% [kernel] [k] system_call_exception
1.93% libasound.so.2.0.0 [.] sync_ptr1
1.85% libasound.so.2.0.0 [.] snd_pcm_unlock
1.84% libasound.so.2.0.0 [.] snd_pcm_mmap_begin
1.83% libasound.so.2.0.0 [.] bad_pcm_state
1.68% libasound.so.2.0.0 [.] snd_pcm_mmap_avail
1.67% libasound.so.2.0.0 [.] snd_pcm_avail_update
With _your_ patch I get the following. copy_from_user() calls
_copy_from_user() and copy_to_user() calls _copy_to_user(). Both then
call __copy_tofrom_user(). In total it is 16.4% so it is worse than before.
14.47% test_perf [.] engine_main
12.00% [kernel] [k] finish_task_switch.isra.0
==> 8.37% [kernel] [k] snd_pcm_ioctl
5.44% libc.so.6 [.] __ioctl_time64
5.03% [kernel] [k] snd_pcm_group_unlock_irq.part.0
==> 4.86% [kernel] [k] __copy_tofrom_user
4.62% [kernel] [k] sys_ioctl
3.22% [kernel] [k] system_call_exception
2.42% libasound.so.2.0.0 [.] snd_pcm_mmap_begin
2.31% [kernel] [k] fdget
2.23% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin_generic
2.19% [kernel] [k] syscall_exit_prepare
1.92% libasound.so.2.0.0 [.] snd_pcm_mmap_avail
1.86% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin
1.68% libasound.so.2.0.0 [.] snd_pcm_hw_avail_update
==> 1.67% [kernel] [k] _copy_from_user
1.66% libasound.so.2.0.0 [.] bad_pcm_state
==> 1.53% [kernel] [k] _copy_to_user
1.40% libasound.so.2.0.0 [.] sync_ptr1
With my patch I get the following:
17.46% test_perf [.] engine_main
9.14% [kernel] [k] finish_task_switch.isra.0
==> 4.92% [kernel] [k] snd_pcm_ioctl
3.99% [kernel] [k] snd_pcm_group_unlock_irq.part.0
3.71% libc.so.6 [.] __ioctl_time64
3.61% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin_generic
2.72% libasound.so.2.0.0 [.] sync_ptr1
2.65% [kernel] [k] system_call_exception
2.46% [kernel] [k] sys_ioctl
2.43% [kernel] [k] __rseq_handle_notify_resume
2.34% [kernel] [k] do_epoll_wait
2.30% libasound.so.2.0.0 [.] __snd_pcm_mmap_commit
2.14% libasound.so.2.0.0 [.] __snd_pcm_avail
2.04% libasound.so.2.0.0 [.] snd_pcm_hw_avail_update
1.89% libasound.so.2.0.0 [.] snd_pcm_lock
1.84% libasound.so.2.0.0 [.] snd_pcm_mmap_avail
1.76% libasound.so.2.0.0 [.] __snd_pcm_avail_update
1.61% libasound.so.2.0.0 [.] bad_pcm_state
1.60% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin
1.49% libasound.so.2.0.0 [.] query_status_data
Christophe
Powered by blists - more mailing lists