lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YPxjbopzwFYJw9hV@casper.infradead.org>
Date:   Sat, 24 Jul 2021 20:01:02 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Andres Freund <andres@...razel.de>
Cc:     James Bottomley <James.Bottomley@...senpartnership.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Christoph Hellwig <hch@....de>,
        Michael Larabel <Michael@...haellarabel.com>
Subject: Re: Folios give an 80% performance win

On Sat, Jul 24, 2021 at 11:45:26AM -0700, Andres Freund wrote:
> On Sat, Jul 24, 2021, at 11:23, James Bottomley wrote:
> > Well, I cut the previous question deliberately, but if you're going to
> > force me to answer, my experience with storage tells me that one test
> > being 10x different from all the others usually indicates a problem
> > with the benchmark test itself rather than a baseline improvement, so
> > I'd wait for more data.
> 
> I have a similar reaction - the large improvements are for a read/write pgbench benchmark at a scale that fits in memory. That's typically purely bound by the speed at which the WAL can be synced to disk. As far as I recall mariadb also uses buffered IO for WAL (but there was recent work in the area).
> 
> Is there a reason fdatasync() of 16MB files to have got a lot faster? Or a chance that could be broken?
> 
> Some improvement for read-only wouldn't surprise me, particularly if the os/pg weren't configured for explicit huge pages. Pgbench has a uniform distribution so its *very* tlb miss heavy with 4k pages.

It's going to depend substantially on the access pattern.  If the 16MB
file (oof, that's tiny!) was read in in large chunks or even in small
chunks, but consecutively, the folio changes will allocate larger pages
(16k, 64k, 256k, ...).  Theoretically it might get up to 2MB pages and
start using PMDs, but I've never seen that in my testing.

fdatasync() could indeed have got much faster.  If we're writing back a
256kB page as a unit, we're handling 64 times less metadata than writing
back 64x4kB pages.  We'll track 64x less dirty bits.  We'll find only
64 dirty pages per 16MB instead of 4096 dirty pages.

It's always possible I just broke something.  The xfstests aren't
exhaustive, and no regressions doesn't mean no problems.

Can you guide Michael towards parameters for pgbench that might give
an indication of performance on a more realistic workload that doesn't
entirely fit in memory?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ