lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170330232147.GL7909@n2100.armlinux.org.uk>
Date:   Fri, 31 Mar 2017 00:21:47 +0100
From:   Russell King - ARM Linux <linux@...linux.org.uk>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Vineet Gupta <Vineet.Gupta1@...opsys.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Richard Henderson <rth@...ddle.net>,
        Will Deacon <will.deacon@....com>,
        Haavard Skinnemoen <hskinnemoen@...il.com>,
        Steven Miao <realmz6@...il.com>,
        Jesper Nilsson <jesper.nilsson@...s.com>,
        Mark Salter <msalter@...hat.com>,
        Yoshinori Sato <ysato@...rs.sourceforge.jp>,
        Richard Kuo <rkuo@...eaurora.org>,
        Tony Luck <tony.luck@...el.com>,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        James Hogan <james.hogan@...tec.com>,
        Michal Simek <monstr@...str.eu>,
        David Howells <dhowells@...hat.com>,
        Ley Foon Tan <lftan@...era.com>,
        Jonas Bonn <Jonas.Nilsson@...opsys.com>
Subject: Re: [RFC][CFT][PATCHSET v1] uaccess unification

On Thu, Mar 30, 2017 at 01:59:58PM -0700, Linus Torvalds wrote:
> On Thu, Mar 30, 2017 at 1:40 PM, Vineet Gupta
> <Vineet.Gupta1@...opsys.com> wrote:
> >
> > So it's a mix bag really. Maybe we need some better directed test to really drill
> > it down.
> 
> As mentioned inn the discussion about ARM, I seriously doubt that the
> inlining will even be noticeable compared to other effects here.

(Sorry to switch sub-threads.)

I'm running tests on that point, concentrating on hdparm -T and perfing
that.  You're right in so far as perf identifies the hotspot as the
copy_to_user() function for that workload, rather than the inlined bits
- the top hits in perf of hdparm -T are:

+   66.52%  hdparm  [k] __copy_to_user_std
+    8.49%  hdparm  [k] generic_file_read_iter
+    3.82%  hdparm  [k] lock_acquire
+    2.80%  hdparm  [k] copy_page_to_iter
+    2.49%  hdparm  [k] find_get_entry
+    1.19%  hdparm  [k] lock_release

Note: perf on ARM does is affected by IRQ-disabled regions, so hotspots
can be off.

The generic_file_read_iter() one is definitely affected by an IRQ-
disabled region in there.

Here's the average hdparm -T transfer rates and standard deviation over
20 samples:

Unpatched:        Average=320.42 MB/s sigma=0.878657
Uaccess+inline:   Average=318.77 MB/s sigma=1.003332
Uaccess+noinline: Average=319.40 MB/s sigma=1.088354

This pattern - where the noinline version sits between the inlined
version and unpatched version seems to be a pattern in all the
measurements I've done so far, and it points to inlining that code
having a slight detrimental effect.  What we don't know is whether
uninlining the code without Al's patch would see a slight boost,
but I'm not about to go there.

However, this all points towards there being a very slight advantage
to dropping the INLINE_COPY_TO_USER and INLINE_COPY_FROM_USER for
ARM, but I'd say it's really down in the noise - I'm not concerned.

> (On ARM, hopefully the UAO bit is faster to set, but it's still
> "another instruction before and after", so even if it's not as
> expensive as clac/stac are on current x86 chips, it's an argument
> against inlining)

The UAO set/clear does show up as a hotspot within copy_page_to_iter(),
but as we can see, overall its about 3% of the workload.  Within
copy_page_to_iter(), it's the __put_user() based loop inside
fault_in_pages_writeable() which has the hotspot, due to the repeated
enable+disable sequence (more the instruction barriers that we need.)

Perf reports that the barriers account for 8.33 and 17.59% of the
time spent within that function, so we're actually talking about
maybe .25% and .5% of this workload spent doing the UAO thing.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ