linux-kernel - Re: [PATCH] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <09fa2b5a-9039-0902-4f57-6a6c2a5f7c37@arm.com>
Date:   Mon, 10 Jul 2017 14:50:38 +0100
From:   Marc Zyngier <marc.zyngier@....com>
To:     Ganapatrao Kulkarni <gpkulkarni@...il.com>
Cc:     Shanker Donthineni <shankerd@...eaurora.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jason Cooper <jason@...edaemon.net>,
        Vikram Sethi <vikrams@...eaurora.org>,
        Jayachandran C <jnair@...iumnetworks.com>,
        "ganapatrao.kulkarni@...ium.com" <ganapatrao.kulkarni@...ium.com>
Subject: Re: [PATCH] irqchip: gicv3-its: Use NUMA aware memory allocation for
 ITS tables

On 10/07/17 11:21, Ganapatrao Kulkarni wrote:
> Hi Marc,
> 
> On Mon, Jul 10, 2017 at 2:53 PM, Marc Zyngier <marc.zyngier@....com> wrote:
>> On 10/07/17 10:08, Ganapatrao Kulkarni wrote:
>>> On Mon, Jul 10, 2017 at 2:36 PM, Marc Zyngier <marc.zyngier@....com> wrote:
>>>> On 10/07/17 09:48, Ganapatrao Kulkarni wrote:
>>>>> Hi Marc,
>>>>>
>>>>> On Mon, Jul 3, 2017 at 8:23 PM, Marc Zyngier <marc.zyngier@....com> wrote:
>>>>>> Hi Shanker,
>>>>>>
>>>>>> On 03/07/17 15:24, Shanker Donthineni wrote:
>>>>>>> Hi Marc,
>>>>>>>
>>>>>>> On 06/30/2017 03:51 AM, Marc Zyngier wrote:
>>>>>>>> On 30/06/17 04:01, Ganapatrao Kulkarni wrote:
>>>>>>>>> On Fri, Jun 30, 2017 at 8:04 AM, Ganapatrao Kulkarni
>>>>>>>>> <gpkulkarni@...il.com> wrote:
>>>>>>>>>> Hi Shanker,
>>>>>>>>>>
>>>>>>>>>> On Sun, Jun 25, 2017 at 9:16 PM, Shanker Donthineni
>>>>>>>>>> <shankerd@...eaurora.org> wrote:
>>>>>>>>>>> The NUMA node information is visible to ITS driver but not being used
>>>>>>>>>>> other than handling errata. This patch allocates the memory for ITS
>>>>>>>>>>> tables from the corresponding NUMA node using the appropriate NUMA
>>>>>>>>>>> aware functions.
>>>>>>>>>
>>>>>>>>> IMHO, the description would have been more constructive?
>>>>>>>>>
>>>>>>>>> "All ITS tables are mapped by default to NODE 0 memory.
>>>>>>>>> Adding changes to allocate memory from respective NUMA NODES of ITS devices.
>>>>>>>>> This will optimize tables access and avoids unnecessary inter-node traffic."
>>>>>>>>
>>>>>>>> But more importantly, I'd like to see figures showing the actual benefit
>>>>>>>> of this per-node allocation. Given that both of you guys have access to
>>>>>>>> such platforms, please show me the numbers!
>>>>>>>>
>>>>>>>
>>>>>>> I'll share the actual results which shows the improvement whenever
>>>>>>> available on our next chips. Current version of Qualcomm qdf2400 doesn't
>>>>>>> support multi socket configuration to capture results and share with you.
>>>>>>>
>>>>>>> Do you see any other issues with this patch apart from the performance
>>>>>>> improvements. I strongly believe this brings the noticeable improvement
>>>>>>> in numbers on systems where it has multi node memory/CPU configuration.
>>>>>>
>>>>>> I agree that it *could* show an improvement, but it very much depends on
>>>>>> how often the ITS misses in its caches. For this kind of patches, I want
>>>>>> to see two things:
>>>>>>
>>>>>> 1) It brings a measurable benefit on NUMA platforms
>>>>>
>>>>> Did some measurement of interrupt response time for LPIs and we don't
>>>>> see any major
>>>>> improvement due to caching of Tables. However, we have seen
>>>>> improvements of around 5%.
>>>>
>>>> An improvement of what exactly?
>>>
>>> interrupt response time.
>>
>> Measured how? On which HW? Using which benchmark?
> 
> This has been tested on ThunderX2.
> We have instrumented gic-v3-its driver code to create dummy LPI device
> with few vectors.
> The LPI is induced from dummy device(through sysfs by writing to
> TRANSLATOR reg).
> The ISR routine(gic_handle_irq) being called to handle the induced LPI.
> NODE 1 cpu is used to induce LPI and NODE 1 cpu/collection is mapped
> in ITT to route this LPI.
> 
> CPU timer counter are sampled at the time LPI is Induced and in ISR
> routine to calculate interrupt response time.
> the result shown improvement of 5% with this patch.

And you call that a realistic measurement of the latency? Really? Sorry,
but I cannot take you seriously here.

> Do you have any recommended benchmarks to test the same?

Run a standard benchmark such as netperf, post the result with and
without that patch. The above is just plain ridiculous.

	M.
-- 
Jazz is not dead. It just smells funny...