[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2217077.1aQXS9nJph@wuerfel>
Date: Mon, 03 Nov 2014 12:56:52 +0100
From: Arnd Bergmann <arnd@...db.de>
To: Kevin Cernekee <cernekee@...il.com>
Cc: f.fainelli@...il.com, tglx@...utronix.de, jason@...edaemon.net,
ralf@...ux-mips.org, linux-sh@...r.kernel.org,
sergei.shtylyov@...entembedded.com, linux-kernel@...r.kernel.org,
devicetree@...r.kernel.org, mbizon@...ebox.fr, jogo@...nwrt.org,
linux-mips@...ux-mips.org
Subject: Re: [PATCH V3 00/14] genirq endian fixes; bcm7120/brcmstb IRQ updates
On Saturday 01 November 2014 18:03:47 Kevin Cernekee wrote:
> V2->V3:
>
> - Move updated irq_reg_{readl,writel} functions back into <linux/irq.h>
> so they can be called by irqchip drivers
>
> - Add gc->reg_{readl,writel} function pointers so that irqchip
> drivers like arch/sh/boards/mach-se/{7343,7722}/irq.c can override them
>
> - CC: linux-sh list in lieu of Paul's defunct linux-sh.org email address
>
> - Fix handling of zero L2 status in bcm7120-l2.c
>
> - Rebase on Linus' head of tree
Looks all great. I also looked at the series now and am very happy
about how it turned out.
> - Drop GENERIC_CHIP / GENERIC_CHIP_BE compile-time optimizations
>
> For the latter item, I ran a quick benchmark to see if the extra
> indirection in irq_reg_{readl,write} had any perceptible effect on
> register access times. The MIPS BE case did show a small performance
> hit from using the read wrapper, but on ARM LE the only differences
> were attributed to the presence/absence of a barrier:
>
>
> BCM3384 (UBUS architecture, MIPS BE, IRQ_GC_BE_IO):
>
> irq_reg_readl : 207 ns
> readl : 186 ns
> __raw_readl : 186 ns
> ioread32be : 195 ns
>
> irq_reg_writel : 177 ns
> writel : 177 ns
> __raw_writel : 177 ns
> iowrite32be : 177 ns
>
>
> BCM7445 (GISB architecture, ARM LE, standard LE readl):
>
> irq_reg_readl : 519 ns
> readl : 519 ns
> __raw_readl : 482 ns
> ioread32be : 519 ns
>
> irq_reg_writel : 500 ns
> writel : 500 ns
> __raw_writel : 482 ns
> iowrite32be : 500 ns
>
Yes, good idea to check this. 43ns is probably not significant to
warrant optimizing this, but if we wanted to, a driver could now
override the accessors using readl_relaxed()/writel_relaxed().
Note that the cost of the barriers can depend a lot on the hardware
setup and on the state of the system. I believe synchronizing the
L2 cache on some Cortex-A9 machines can be particularly expensive.
Anyway, the existing code doesn't do it, so we can leave that as
a possible optimization.
Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists