[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170330232147.GL7909@n2100.armlinux.org.uk>
Date: Fri, 31 Mar 2017 00:21:47 +0100
From: Russell King - ARM Linux <linux@...linux.org.uk>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Vineet Gupta <Vineet.Gupta1@...opsys.com>,
Al Viro <viro@...iv.linux.org.uk>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Richard Henderson <rth@...ddle.net>,
Will Deacon <will.deacon@....com>,
Haavard Skinnemoen <hskinnemoen@...il.com>,
Steven Miao <realmz6@...il.com>,
Jesper Nilsson <jesper.nilsson@...s.com>,
Mark Salter <msalter@...hat.com>,
Yoshinori Sato <ysato@...rs.sourceforge.jp>,
Richard Kuo <rkuo@...eaurora.org>,
Tony Luck <tony.luck@...el.com>,
Geert Uytterhoeven <geert@...ux-m68k.org>,
James Hogan <james.hogan@...tec.com>,
Michal Simek <monstr@...str.eu>,
David Howells <dhowells@...hat.com>,
Ley Foon Tan <lftan@...era.com>,
Jonas Bonn <Jonas.Nilsson@...opsys.com>
Subject: Re: [RFC][CFT][PATCHSET v1] uaccess unification
On Thu, Mar 30, 2017 at 01:59:58PM -0700, Linus Torvalds wrote:
> On Thu, Mar 30, 2017 at 1:40 PM, Vineet Gupta
> <Vineet.Gupta1@...opsys.com> wrote:
> >
> > So it's a mix bag really. Maybe we need some better directed test to really drill
> > it down.
>
> As mentioned inn the discussion about ARM, I seriously doubt that the
> inlining will even be noticeable compared to other effects here.
(Sorry to switch sub-threads.)
I'm running tests on that point, concentrating on hdparm -T and perfing
that. You're right in so far as perf identifies the hotspot as the
copy_to_user() function for that workload, rather than the inlined bits
- the top hits in perf of hdparm -T are:
+ 66.52% hdparm [k] __copy_to_user_std
+ 8.49% hdparm [k] generic_file_read_iter
+ 3.82% hdparm [k] lock_acquire
+ 2.80% hdparm [k] copy_page_to_iter
+ 2.49% hdparm [k] find_get_entry
+ 1.19% hdparm [k] lock_release
Note: perf on ARM does is affected by IRQ-disabled regions, so hotspots
can be off.
The generic_file_read_iter() one is definitely affected by an IRQ-
disabled region in there.
Here's the average hdparm -T transfer rates and standard deviation over
20 samples:
Unpatched: Average=320.42 MB/s sigma=0.878657
Uaccess+inline: Average=318.77 MB/s sigma=1.003332
Uaccess+noinline: Average=319.40 MB/s sigma=1.088354
This pattern - where the noinline version sits between the inlined
version and unpatched version seems to be a pattern in all the
measurements I've done so far, and it points to inlining that code
having a slight detrimental effect. What we don't know is whether
uninlining the code without Al's patch would see a slight boost,
but I'm not about to go there.
However, this all points towards there being a very slight advantage
to dropping the INLINE_COPY_TO_USER and INLINE_COPY_FROM_USER for
ARM, but I'd say it's really down in the noise - I'm not concerned.
> (On ARM, hopefully the UAO bit is faster to set, but it's still
> "another instruction before and after", so even if it's not as
> expensive as clac/stac are on current x86 chips, it's an argument
> against inlining)
The UAO set/clear does show up as a hotspot within copy_page_to_iter(),
but as we can see, overall its about 3% of the workload. Within
copy_page_to_iter(), it's the __put_user() based loop inside
fault_in_pages_writeable() which has the hotspot, due to the repeated
enable+disable sequence (more the instruction barriers that we need.)
Perf reports that the barriers account for 8.33 and 17.59% of the
time spent within that function, so we're actually talking about
maybe .25% and .5% of this workload spent doing the UAO thing.
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
Powered by blists - more mailing lists