[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <00226633-0a5a-bcca-0a2a-9bfd754e61a5@csgroup.eu>
Date: Tue, 14 Sep 2021 16:00:56 +0200
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
Michael Ellerman <mpe@...erman.id.au>, hch@...radead.org,
linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH RESEND v3 6/6] powerpc/signal: Use
unsafe_copy_siginfo_to_user()
Le 13/09/2021 à 21:11, Eric W. Biederman a écrit :
> Christophe Leroy <christophe.leroy@...roup.eu> writes:
>
>> Le 13/09/2021 à 18:21, Eric W. Biederman a écrit :
>>> ebiederm@...ssion.com (Eric W. Biederman) writes:
>>>
>>>> Christophe Leroy <christophe.leroy@...roup.eu> writes:
>>>>
>>>>> Use unsafe_copy_siginfo_to_user() in order to do the copy
>>>>> within the user access block.
>>>>>
>>>>> On an mpc 8321 (book3s/32) the improvment is about 5% on a process
>>>>> sending a signal to itself.
>>>
>>> If you can't make function calls from an unsafe macro there is another
>>> way to handle this that doesn't require everything to be inline.
>>>
>>> From a safety perspective it is probably even a better approach.
>>
>> Yes but that's exactly what I wanted to avoid for the native ppc32 case: this
>> double hop means useless pressure on the cache. The siginfo_t structure is 128
>> bytes large, that means 8 lines of cache on powerpc 8xx.
>>
>> But maybe it is acceptable to do that only for the compat case. Let me think
>> about it, it might be quite easy.
>
> The places get_signal is called tend to be well known. So I think we
> are safe from a capacity standpoint.
>
> I am not certain it makes a difference in capacity as there is a high
> probability that the stack was deeper recently than it is now which
> suggests the cache blocks might already be in the cache.
>
> My sense it is worth benchmarking before optimizing out the extra copy
> like that.
>
> On the extreme side there is simply building the entire sigframe on the
> stack and then just calling it copy_to_user. As the stack cache lines
> are likely to be hot, and copy_to_user is quite well optimized
> there is a real possibility that it is faster to build everything
> on the kernel stack, and then copy it to the user space stack.
>
> It is also possible that I am wrong and we may want to figure out how
> far up we can push the conversion to the 32bit siginfo format.
>
> If could move the work into collect_signal we could guarantee there
> would be no extra work. That would require adjusting the sigframe
> generation code on all of the architectures.
>
> There is a lot we can do but we need benchmarking to tell if it is
> worth it.
>
Sure, I'm benchmarking all the work I have been doing on signal code
with the following simple app that I run with 'perf stat':
#include <stdlib.h>
#include <signal.h>
void sigusr1(int sig) { }
int main(int argc, char **argv)
{
int i = 100000;
signal(SIGUSR1, sigusr1);
for (;i--;)
raise(SIGUSR1);
exit(0);
}
On an mpc8321 a 32 bits powerpc with KUAP enabled (KUAP is equivalent of
x86 SMAP)
Before changing copy_siginfo_to_user() to unsafe_copy_siginfo_to_user(),
'perf stat' reports 1983 msec (task-clock)
After my change I get 1900 msec.
With your approach I get 1930 msec, so we are loosing 36% of the benefit
of converting to the 'unsafe_' alternative.
So I think it is worth it.
Christophe
Powered by blists - more mailing lists