[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK8P3a1oO_moABCtNqLkM9ccVh9c=andfz+qiSucTCXcqJkYVA@mail.gmail.com>
Date: Fri, 14 May 2021 14:22:58 +0200
From: Arnd Bergmann <arnd@...nel.org>
To: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>
Cc: linux-arch <linux-arch@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Vineet Gupta <vgupta@...opsys.com>,
Yoshinori Sato <ysato@...rs.sourceforge.jp>,
Rich Felker <dalias@...c.org>,
Linux-sh list <linux-sh@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 03/13] sh: remove unaligned access for sh4a
On Fri, May 14, 2021 at 12:34 PM John Paul Adrian Glaubitz
<glaubitz@...sik.fu-berlin.de> wrote:
>
> Hi Arnd!
>
> On 5/14/21 12:00 PM, Arnd Bergmann wrote:
> > Unlike every other architecture, sh4a uses an inline asm implementation
> > for get_unaligned(). I have shown that this produces better object
> > code than the asm-generic version. However, there are very few users of
> > arch/sh/ overall, and most of those seem to use sh4 rather than sh4a CPU
> > cores, so it seems not worth keeping the complexity in the architecture
> > independent code.
>
> My Renesas SH4-Boards actually run an sh4a-Kernel, not an sh4-Kernel:
>
> root@...pitz:~> uname -a
> Linux tirpitz 5.11.0-rc4-00012-g10c03c5bf422 #161 PREEMPT Mon Jan 18 21:10:17 CET 2021 sh4a GNU/Linux
> root@...pitz:~>
>
> So, if this change reduces performance on sh4a, I would rather not merge it.
It only makes a difference in very specific scenarios in which unaligned
accesses are done in a fast path, e.g. when forwarding network packet
at a high rate on a big-endian kernel (little-endian kernels wouldn't run into
this on IP headers). If you have a use case for this machine on which the
you can show a performance regression, I can add a patch on top to put
the optimized sh4a get_unaligned_le32() back. Dropping this patch
altogether would make the series much more complex because most of
the associated code gets removed in the end.
As I mentioned, supporting "movua" in the compiler likely has a much
larger impact on performance, as it would also help in user space, and
it should improve the networking case on little-endian kernels by replacing
the four separate byte loads/shift pairs with a movua plus a byteswap.
Not sure if there are gcc developers that have an active interest in sh4a
support any more.
Arnd
Powered by blists - more mailing lists