lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20131003023311.GA19176@kroah.com>
Date:	Wed, 2 Oct 2013 19:33:11 -0700
From:	Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:	Khalid Aziz <khalid.aziz@...cle.com>
Cc:	Jack Wang <jinpu.wang@...fitbricks.com>,
	Luis Henriques <luis.henriques@...onical.com>,
	linux-kernel@...r.kernel.org, stable@...r.kernel.org,
	kernel-team@...ts.ubuntu.com, Pravin B Shelar <pshelar@...ira.com>,
	Christoph Lameter <cl@...ux.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Mel Gorman <mel@....ul.ie>, Rik van Riel <riel@...hat.com>,
	Minchan Kim <minchan@...nel.org>,
	Andi Kleen <andi@...stfloor.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 092/104] mm: fix aio performance regression for database
 caused by THP

On Mon, Sep 30, 2013 at 08:00:02AM -0700, Greg Kroah-Hartman wrote:
> On Mon, Sep 30, 2013 at 07:31:35AM -0600, Khalid Aziz wrote:
> > On 09/30/2013 07:26 AM, Greg Kroah-Hartman wrote:
> > > On Mon, Sep 30, 2013 at 03:14:52PM +0200, Jack Wang wrote:
> > >> On 09/30/2013 12:11 PM, Luis Henriques wrote:
> > >>> 3.5.7.22 -stable review patch.  If anyone has any objections, please let me know.
> > >>>
> > >>> ------------------
> > >>>
> > >>> From: Khalid Aziz <khalid.aziz@...cle.com>
> > >>>
> > >>> commit 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 upstream.
> > >>>
> > >>> I am working with a tool that simulates oracle database I/O workload.
> > >>> This tool (orion to be specific -
> > >>> <http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#autoId24>)
> > >>> allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag.  It then
> > >>> does aio into these pages from flash disks using various common block
> > >>> sizes used by database.  I am looking at performance with two of the most
> > >>> common block sizes - 1M and 64K.  aio performance with these two block
> > >>> sizes plunged after Transparent HugePages was introduced in the kernel.
> > >>> Here are performance numbers:
> > >>>
> > >>> 		pre-THP		2.6.39		3.11-rc5
> > >>> 1M read		8384 MB/s	5629 MB/s	6501 MB/s
> > >>> 64K read	7867 MB/s	4576 MB/s	4251 MB/s
> > >>>
> > >>> I have narrowed the performance impact down to the overheads introduced by
> > >>> THP in __get_page_tail() and put_compound_page() routines.  perf top shows
> > >>>> 40% of cycles being spent in these two routines.  Every time direct I/O
> > >>> to hugetlbfs pages starts, kernel calls get_page() to grab a reference to
> > >>> the pages and calls put_page() when I/O completes to put the reference
> > >>> away.  THP introduced significant amount of locking overhead to get_page()
> > >>> and put_page() when dealing with compound pages because hugepages can be
> > >>> split underneath get_page() and put_page().  It added this overhead
> > >>> irrespective of whether it is dealing with hugetlbfs pages or transparent
> > >>> hugepages.  This resulted in 20%-45% drop in aio performance when using
> > >>> hugetlbfs pages.
> > >>>
> > >>> Since hugetlbfs pages can not be split, there is no reason to go through
> > >>> all the locking overhead for these pages from what I can see.  I added
> > >>> code to __get_page_tail() and put_compound_page() to bypass all the
> > >>> locking code when working with hugetlbfs pages.  This improved performance
> > >>> significantly.  Performance numbers with this patch:
> > >>>
> > >>> 		pre-THP		3.11-rc5	3.11-rc5 + Patch
> > >>> 1M read		8384 MB/s	6501 MB/s	8371 MB/s
> > >>> 64K read	7867 MB/s	4251 MB/s	6510 MB/s
> > >>>
> > >>> Performance with 64K read is still lower than what it was before THP, but
> > >>> still a 53% improvement.  It does mean there is more work to be done but I
> > >>> will take a 53% improvement for now.
> > >>>
> > >>> Please take a look at the following patch and let me know if it looks
> > >>> reasonable.
> > >>>
> > >>> [akpm@...ux-foundation.org: tweak comments]
> > >>> Signed-off-by: Khalid Aziz <khalid.aziz@...cle.com>
> > >>> Cc: Pravin B Shelar <pshelar@...ira.com>
> > >>> Cc: Christoph Lameter <cl@...ux.com>
> > >>> Cc: Andrea Arcangeli <aarcange@...hat.com>
> > >>> Cc: Johannes Weiner <hannes@...xchg.org>
> > >>> Cc: Mel Gorman <mel@....ul.ie>
> > >>> Cc: Rik van Riel <riel@...hat.com>
> > >>> Cc: Minchan Kim <minchan@...nel.org>
> > >>> Cc: Andi Kleen <andi@...stfloor.org>
> > >>> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> > >>> Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
> > >>> [ luis: backported to 3.5: adjusted context ]
> > >>> Signed-off-by: Luis Henriques <luis.henriques@...onical.com>
> > >> Hi Greg,
> > >>
> > >> I suppose this patch also needed for 3.4, right?
> > >
> > > As it didn't originally apply there, I didn't apply it.
> > >
> > > If people think it should be applicable for 3.4, I'll take it.
> > >
> > > thanks,
> > >
> > > greg k-h
> > >
> > 
> > Hi Greg,
> > 
> > I did send you a backported version of this patch to apply to 3.0, 3.2 
> > and 3.4 last Monday and cc'd stable@...r.kernel.org. That patch should 
> > apply cleanly to those three kernels.
> 
> Ah, you didn't specifically say that in the patch, so I just thought you
> were reminding me to apply it to the 3.10 and 3.11 trees.  Please be
> more explicit in the future.
> 
> I'll queue it up for the next round of stable kernels after this one.

And I've lost it, I can't find it in my archives anywhere.  Sorry about
that, can you resend it please?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ