[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87a56croin.wl-tiwai@suse.de>
Date: Fri, 13 Jun 2025 11:15:12 +0200
From: Takashi Iwai <tiwai@...e.de>
To: Christophe Leroy <christophe.leroy@...roup.eu>
Cc: Takashi Iwai <tiwai@...e.de>,
Jaroslav Kysela <perex@...ex.cz>,
Takashi Iwai <tiwai@...e.com>,
linux-kernel@...r.kernel.org,
linuxppc-dev@...ts.ozlabs.org,
linux-sound@...r.kernel.org,
Herve Codina <herve.codina@...tlin.com>,
Mark Brown <broonie@...nel.org>
Subject: Re: [PATCH] ALSA: pcm: Convert snd_pcm_ioctl_sync_ptr_{compat/x32} to user_access_begin/user_access_end()
On Fri, 13 Jun 2025 07:24:46 +0200,
Christophe Leroy wrote:
>
>
>
> Le 12/06/2025 à 13:02, Takashi Iwai a écrit :
> > On Thu, 12 Jun 2025 12:39:39 +0200,
> > Christophe Leroy wrote:
> >>
> >> With user access protection (Called SMAP on x86 or KUAP on powerpc)
> >> each and every call to get_user() or put_user() performs heavy
> >> operations to unlock and lock kernel access to userspace.
> >>
> >> SNDRV_PCM_IOCTL_SYNC_PTR ioctl is a hot path that needs to be
> >> optimised. To do that, perform user accesses by blocks using
> >> user_access_begin/user_access_end() and unsafe_get_user()/
> >> unsafe_put_user() and alike.
> >>
> >> Before the patch the 9 calls to put_user() at the
> >> end of snd_pcm_ioctl_sync_ptr_compat() imply the following set of
> >> instructions about 9 times (access_ok - enable user - write - disable
> >> user):
> >> 0.00 : c057f858: 3d 20 7f ff lis r9,32767
> >> 0.29 : c057f85c: 39 5e 00 14 addi r10,r30,20
> >> 0.77 : c057f860: 61 29 ff fc ori r9,r9,65532
> >> 0.32 : c057f864: 7c 0a 48 40 cmplw r10,r9
> >> 0.36 : c057f868: 41 a1 fb 58 bgt c057f3c0 <snd_pcm_ioctl+0xbb0>
> >> 0.30 : c057f86c: 3d 20 dc 00 lis r9,-9216
> >> 1.95 : c057f870: 7d 3a c3 a6 mtspr 794,r9
> >> 0.33 : c057f874: 92 8a 00 00 stw r20,0(r10)
> >> 0.27 : c057f878: 3d 20 de 00 lis r9,-8704
> >> 0.28 : c057f87c: 7d 3a c3 a6 mtspr 794,r9
> >> ...
> >>
> >> A perf profile shows that in total the 9 put_user() represent 36% of
> >> the time spent in snd_pcm_ioctl() and about 80 instructions.
> >>
> >> With this patch everything is done in 13 instructions and represent
> >> only 15% of the time spent in snd_pcm_ioctl():
> >>
> >> 0.57 : c057f5dc: 3d 20 dc 00 lis r9,-9216
> >> 0.98 : c057f5e0: 7d 3a c3 a6 mtspr 794,r9
> >> 0.16 : c057f5e4: 92 7f 00 04 stw r19,4(r31)
> >> 0.63 : c057f5e8: 93 df 00 0c stw r30,12(r31)
> >> 0.16 : c057f5ec: 93 9f 00 10 stw r28,16(r31)
> >> 4.95 : c057f5f0: 92 9f 00 14 stw r20,20(r31)
> >> 0.19 : c057f5f4: 92 5f 00 18 stw r18,24(r31)
> >> 0.49 : c057f5f8: 92 bf 00 1c stw r21,28(r31)
> >> 0.27 : c057f5fc: 93 7f 00 20 stw r27,32(r31)
> >> 5.88 : c057f600: 93 36 00 00 stw r25,0(r22)
> >> 0.11 : c057f604: 93 17 00 00 stw r24,0(r23)
> >> 0.00 : c057f608: 3d 20 de 00 lis r9,-8704
> >> 0.79 : c057f60c: 7d 3a c3 a6 mtspr 794,r9
> >>
> >> Note that here the access_ok() in user_write_access_begin() is skipped
> >> because the exact same verification has already been performed at the
> >> beginning of the fonction with the call to user_read_access_begin().
> >>
> >> Signed-off-by: Christophe Leroy <christophe.leroy@...roup.eu>
> >> ---
> >> This is a lighter version of previous patch "[PATCH v2] ALSA: pcm: Convert multiple {get/put}_user to user_access_begin/user_access_end()" focussing on identified hot path.
> >> Moved and nested the failure labels closer in order to increase readability
> >
> > Thanks for the revised patch!
> >
> > Although it's now much lighter, I still believe that we can reduce
> > get_user() / put_user() calls significantly by adjusting the struct
> > usage.
> >
> > Could you check whether the patch below can improve?
> > (Zero-ing of sstatus can be an overhead here, but there are some
> > holes, and these need to be initialized before copying back...)
> >
>
> Thanks for the proposed patch. Unfortunately it doesn't improve the
> situation. The problem with copy_from_user() and copy_to_user() is
> that they perform quite bad on small regions. And for the from_user
> side we still get two user access enable/disable instead 3 and for the
> to_user side we still get two as well (Allthough we had 7
> previously). Those 4 user access enable/disable still have a cost.
>
> Nowadays the tendency is really to go for the unsafe_put/get_user
> style, see some significant exemples below. And as mentioned in those
> commits, behind the performance improvement it also lead to much
> cleaner code generation.
> - 38ebcf5096a8 ("scm: optimize put_cmsg()")
> - 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
> unsafe_put_user()")
> - ef0ba0553829 ("poll: fix performance regression due to out-of-line
> __put_user()")
>
> Commit 38ebcf5096a8 is explicit about copy_to_user() being bad for
> small regions.
>
> Here below is some comparison between the three way of doing
> snd_pcm_ioctl_sync_ptr_compat(), output is from 'perf top':
>
> Initially I got the following. That 12%+ is the reason why I started
> investigating.
>
> 14.20% test_perf [.] engine_main
> ==> 12.86% [kernel] [k] snd_pcm_ioctl
> 11.91% [kernel] [k] finish_task_switch.isra.0
> 4.15% [kernel] [k] snd_pcm_group_unlock_irq.part.0
> 4.07% libc.so.6 [.] __ioctl_time64
> 3.58% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin_generic
> 3.37% [kernel] [k] sys_ioctl
> 2.96% libasound.so.2.0.0 [.] snd_pcm_hw_avail_update
> 2.73% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin
> 2.58% [kernel] [k] system_call_exception
> 1.93% libasound.so.2.0.0 [.] sync_ptr1
> 1.85% libasound.so.2.0.0 [.] snd_pcm_unlock
> 1.84% libasound.so.2.0.0 [.] snd_pcm_mmap_begin
> 1.83% libasound.so.2.0.0 [.] bad_pcm_state
> 1.68% libasound.so.2.0.0 [.] snd_pcm_mmap_avail
> 1.67% libasound.so.2.0.0 [.] snd_pcm_avail_update
>
> With _your_ patch I get the following. copy_from_user() calls
> _copy_from_user() and copy_to_user() calls _copy_to_user(). Both then
> call __copy_tofrom_user(). In total it is 16.4% so it is worse than
> before.
>
> 14.47% test_perf [.] engine_main
> 12.00% [kernel] [k] finish_task_switch.isra.0
> ==> 8.37% [kernel] [k] snd_pcm_ioctl
> 5.44% libc.so.6 [.] __ioctl_time64
> 5.03% [kernel] [k] snd_pcm_group_unlock_irq.part.0
> ==> 4.86% [kernel] [k] __copy_tofrom_user
> 4.62% [kernel] [k] sys_ioctl
> 3.22% [kernel] [k] system_call_exception
> 2.42% libasound.so.2.0.0 [.] snd_pcm_mmap_begin
> 2.31% [kernel] [k] fdget
> 2.23% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin_generic
> 2.19% [kernel] [k] syscall_exit_prepare
> 1.92% libasound.so.2.0.0 [.] snd_pcm_mmap_avail
> 1.86% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin
> 1.68% libasound.so.2.0.0 [.] snd_pcm_hw_avail_update
> ==> 1.67% [kernel] [k] _copy_from_user
> 1.66% libasound.so.2.0.0 [.] bad_pcm_state
> ==> 1.53% [kernel] [k] _copy_to_user
> 1.40% libasound.so.2.0.0 [.] sync_ptr1
>
> With my patch I get the following:
>
> 17.46% test_perf [.] engine_main
> 9.14% [kernel] [k] finish_task_switch.isra.0
> ==> 4.92% [kernel] [k] snd_pcm_ioctl
> 3.99% [kernel] [k] snd_pcm_group_unlock_irq.part.0
> 3.71% libc.so.6 [.] __ioctl_time64
> 3.61% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin_generic
> 2.72% libasound.so.2.0.0 [.] sync_ptr1
> 2.65% [kernel] [k] system_call_exception
> 2.46% [kernel] [k] sys_ioctl
> 2.43% [kernel] [k] __rseq_handle_notify_resume
> 2.34% [kernel] [k] do_epoll_wait
> 2.30% libasound.so.2.0.0 [.] __snd_pcm_mmap_commit
> 2.14% libasound.so.2.0.0 [.] __snd_pcm_avail
> 2.04% libasound.so.2.0.0 [.] snd_pcm_hw_avail_update
> 1.89% libasound.so.2.0.0 [.] snd_pcm_lock
> 1.84% libasound.so.2.0.0 [.] snd_pcm_mmap_avail
> 1.76% libasound.so.2.0.0 [.] __snd_pcm_avail_update
> 1.61% libasound.so.2.0.0 [.] bad_pcm_state
> 1.60% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin
> 1.49% libasound.so.2.0.0 [.] query_status_data
Thanks for the detailed analysis! Then unsafe_*_user() seems to be
the way to go. I'll check your latest patches.
Takashi
Powered by blists - more mailing lists