lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fcb413c6-1369-3481-11bf-4934cbb7f1d6@codeaurora.org>
Date:   Mon, 28 Nov 2016 20:13:49 -0600
From:   Shanker Donthineni <shankerd@...eaurora.org>
To:     Marc Zyngier <marc.zyngier@....com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Jason Cooper <jason@...edaemon.net>,
        Vikram Sethi <vikrams@...eaurora.org>
Subject: Re: [PATCH v2] irqchip/gicv3-its: Enable cacheable attribute
 Read-allocate hints

Hi Marc,

Please comment on this patch. I'm seeing ~1000 cpu cycles interrupt 
latency improvement on Qualcomm server platforms if the ITS tables are 
not in L1/L2/L3 cache. You can find the test code at 
https://patchwork.kernel.org/patch/9416793 which I used to validate my 
code changes.

Thanks,
Shanker

On 11/08/2016 01:18 AM, Shanker Donthineni wrote:
> Read-allocation hints are not enabled for both the GIC-ITS and GICR
> tables. This forces the hardware to always read the table contents
> from an external memory (DDR) which is slow compared to cache memory.
> Most of the tables are often read by hardware. So, it's better to
> enable Read-allocate hints in addition to Write-allocate hints in
> order to improve the GICR_PEND, GICR_PROP, Collection, Device, and
> vCPU tables lookup time.
>
> Signed-off-by: Shanker Donthineni <shankerd@...eaurora.org>
> ---
> Implemented a test case to prove that enabling Read Allocation hints
> improves ITS lookup time ~15% while delivering a LPI event. Used the
> ITS command INV to analyze time spent in device, collection, prop and
> pending table lookups.
>    
> Pseudo code:
>      Create a fake ITS device.
>      Record PMU cycle counter before sending INV command.
>      Build and send ITS INT command.
>           ITS hardware triggers device table lookup.
>                ITTE table & collection table lookup.
>            ITS property table lookup.
>            ITS pending table lookup.
>            Deliver interrupt to CPU interface.
>      do_IRQ() called.
>      Measure the total CPU cycle spent to reach this point.
>
> Without ReadAllocation hints:
> /sys/kernel/debug # echo 100 > lpitest
> [   94.693968] CPU[1] niter=100 cycles=0x8dfc0 avg=0x16b7 min=0x1652
>
> With ReadAllocation hints:
> /sys/kernel/debug # echo 100 > lpitest
> [   98.617873] CPU[1] niter=100 cycles=0x7df49 avg=0x1427 min=0x1388
>
>   drivers/irqchip/irq-gic-v3-its.c | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c
> b/drivers/irqchip/irq-gic-v3-its.c
> index c5dee30..227a1eb 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -961,7 +961,7 @@ static bool its_parse_baser_device(struct its_node
> *its, struct its_baser *baser
>   				   u32 psz, u32 *order)
>   {
>   	u64 esz = GITS_BASER_ENTRY_SIZE(its_read_baser(its, baser));
> -	u64 val = GITS_BASER_InnerShareable | GITS_BASER_WaWb;
> +	u64 val = GITS_BASER_InnerShareable | GITS_BASER_RaWaWb;
>   	u32 ids = its->device_ids;
>   	u32 new_order = *order;
>   	bool indirect = false;
> @@ -1026,7 +1026,7 @@ static int its_alloc_tables(struct its_node *its)
>   	u64 typer = gic_read_typer(its->base + GITS_TYPER);
>   	u32 ids = GITS_TYPER_DEVBITS(typer);
>   	u64 shr = GITS_BASER_InnerShareable;
> -	u64 cache = GITS_BASER_WaWb;
> +	u64 cache = GITS_BASER_RaWaWb;
>   	u32 psz = SZ_64K;
>   	int err, i;
>   
> @@ -1123,7 +1123,7 @@ static void its_cpu_init_lpis(void)
>   	/* set PROPBASE */
>   	val = (page_to_phys(gic_rdists->prop_page) |
>   	       GICR_PROPBASER_InnerShareable |
> -	       GICR_PROPBASER_WaWb |
> +	       GICR_PROPBASER_RaWaWb |
>   	       ((LPI_NRBITS - 1) & GICR_PROPBASER_IDBITS_MASK));
>   
>   	writeq_relaxed(val, rbase + GICR_PROPBASER);
> @@ -1148,7 +1148,7 @@ static void its_cpu_init_lpis(void)
>   	/* set PENDBASE */
>   	val = (page_to_phys(pend_page) |
>   	       GICR_PENDBASER_InnerShareable |
> -	       GICR_PENDBASER_WaWb);
> +	       GICR_PENDBASER_RaWaWb);
>   
>   	writeq_relaxed(val, rbase + GICR_PENDBASER);
>   	tmp = readq_relaxed(rbase + GICR_PENDBASER);
> @@ -1712,7 +1712,7 @@ static int __init its_probe_one(struct resource
> *res,
>   		goto out_free_tables;
>   
>   	baser = (virt_to_phys(its->cmd_base)	|
> -		 GITS_CBASER_WaWb		|
> +		 GITS_CBASER_RaWaWb		|
>   		 GITS_CBASER_InnerShareable	|
>   		 (ITS_CMD_QUEUE_SZ / SZ_4K - 1)	|
>   		 GITS_CBASER_VALID);

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ