linux-kernel - Re: [PATCH] IPI performance benchmark

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <f4a39bd1-d06a-21f1-3ef5-42b38356edb0@de.ibm.com>
Date:   Wed, 13 Dec 2017 12:31:56 +0100
From:   Christian Borntraeger <borntraeger@...ibm.com>
To:     Yury Norov <ynorov@...iumnetworks.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ashish Kalra <Ashish.Kalra@...ium.com>,
        Christoffer Dall <christoffer.dall@...aro.org>,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        Linu Cherian <Linu.Cherian@...ium.com>,
        Sunil Goutham <Sunil.Goutham@...ium.com>
Subject: Re: [PATCH] IPI performance benchmark



On 12/13/2017 12:23 PM, Yury Norov wrote:
> On Mon, Dec 11, 2017 at 05:30:25PM +0100, Christian Borntraeger wrote:
>>
>>
>> On 12/11/2017 03:55 PM, Yury Norov wrote:
>>> On Mon, Dec 11, 2017 at 03:35:02PM +0100, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 12/11/2017 03:16 PM, Yury Norov wrote:
>>>>> This benchmark sends many IPIs in different modes and measures
>>>>> time for IPI delivery (first column), and total time, ie including
>>>>> time to acknowledge the receive by sender (second column).
>>>>>
>>>>> The scenarios are:
>>>>> Dry-run:	do everything except actually sending IPI. Useful
>>>>> 		to estimate system overhead.
>>>>> Self-IPI:	Send IPI to self CPU.
>>>>> Normal IPI:	Send IPI to some other CPU.
>>>>> Broadcast IPI:	Send broadcast IPI to all online CPUs.
>>>>>
>>>>> For virtualized guests, sending and reveiving IPIs causes guest exit.
>>>>> I used this test to measure performance impact on KVM subsystem of
>>>>> Christoffer Dall's series "Optimize KVM/ARM for VHE systems".
>>>>>
>>>>> https://www.spinics.net/lists/kvm/msg156755.html
>>>>>
>>>>> Test machine is ThunderX2, 112 online CPUs. Below the results normalized
>>>>> to host dry-run time. Smaller - better.
>>>>>
>>>>> Host, v4.14:
>>>>> Dry-run:	  0	    1
>>>>> Self-IPI:         9	   18
>>>>> Normal IPI:      81	  110
>>>>> Broadcast IPI:    0	 2106
>>>>>
>>>>> Guest, v4.14:
>>>>> Dry-run:          0	    1
>>>>> Self-IPI:        10	   18
>>>>> Normal IPI:     305	  525
>>>>> Broadcast IPI:    0    	 9729
>>>>>
>>>>> Guest, v4.14 + VHE:
>>>>> Dry-run:          0	    1
>>>>> Self-IPI:         9	   18
>>>>> Normal IPI:     176	  343
>>>>> Broadcast IPI:    0	 9885
>> [...]
>>>>> +static int __init init_bench_ipi(void)
>>>>> +{
>>>>> +	ktime_t ipi, total;
>>>>> +	int ret;
>>>>> +
>>>>> +	ret = bench_ipi(NTIMES, DRY_RUN, &ipi, &total);
>>>>> +	if (ret)
>>>>> +		pr_err("Dry-run FAILED: %d\n", ret);
>>>>> +	else
>>>>> +		pr_err("Dry-run:       %18llu, %18llu ns\n", ipi, total);
>>>>
>>>> you do not use NTIMES here to calculate the average value. Is that intended?
>>>
>>> I think, it's more visually to represent all results in number of dry-run
>>> times, like I did in patch description. So on kernel side I expose raw data
>>> and calculate final values after finishing tests.
>>
>> I think it is highly confusing that the output from the patch description does not
>> match the output from the real module. So can you make that match at least?
> 
> I think so. That's why I noticed that results are normalized to host dry-run
> time, even more, they are small and better for human perception.
> 
> I was recommended not to public raw data, you'd understand. If this is
> the blocker, I can post results from QEMU-hosted kernel.

you could just post some example data from any random x86 laptop. I think it
would just be good to have the patch description output match the real output.