lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YPxg3RYDp/C7/ack@casper.infradead.org>
Date:   Sat, 24 Jul 2021 19:50:05 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     James Bottomley <James.Bottomley@...senpartnership.com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Christoph Hellwig <hch@....de>,
        Andres Freund <andres@...razel.de>,
        Michael Larabel <Michael@...haellarabel.com>
Subject: Re: Folios give an 80% performance win

On Sat, Jul 24, 2021 at 11:23:25AM -0700, James Bottomley wrote:
> On Sat, 2021-07-24 at 19:14 +0100, Matthew Wilcox wrote:
> > On Sat, Jul 24, 2021 at 11:09:02AM -0700, James Bottomley wrote:
> > > On Sat, 2021-07-24 at 18:27 +0100, Matthew Wilcox wrote:
> > > > What blows me away is the 80% performance improvement for
> > > > PostgreSQL. I know they use the page cache extensively, so it's
> > > > plausibly real. I'm a bit surprised that it has such good
> > > > locality, and the size of the win far exceeds my
> > > > expectations.  We should probably dive into it and figure out
> > > > exactly what's going on.
> > > 
> > > Since none of the other tested databases showed more than a 3%
> > > improvement, this looks like an anomalous result specific to
> > > something in postgres ... although the next biggest db: mariadb
> > > wasn't part of the tests so I'm not sure that's
> > > definitive.  Perhaps the next step should be to t
> > > est mariadb?  Since they're fairly similar in domain (both full
> > > SQL) if mariadb shows this type of improvement, you can
> > > safely assume it's something in the way SQL databases handle paging
> > > and if it doesn't, it's likely fixing a postgres inefficiency.
> > 
> > I think the thing that's specific to PostgreSQL is that it's a heavy
> > user of the page cache.  My understanding is that most databases use
> > direct IO and manage their own page cache, while PostgreSQL trusts
> > the kernel to get it right.
> 
> That's testable with mariadb, at least for the innodb engine since the
> flush_method is settable. 

We're still not communicating well.  I'm not talking about writes,
I'm talking about reads.  Postgres uses the page cache for reads.
InnoDB uses O_DIRECT (afaict).  See articles like this one:
https://www.percona.com/blog/2018/02/08/fsync-performance-storage-devices/

: The first and most obvious type of IO are pages reads and writes from
: the tablespaces. The pages are most often read one at a time, as 16KB
: random read operations. Writes to the tablespaces are also typically
: 16KB random operations, but they are done in batches. After every batch,
: fsync is called on the tablespace file handle.

(the current folio patch set does not create multi-page folios for
writes, only for reads)

I downloaded the mariadb source package that's in Debian, and from
what I can glean, it does indeed set O_DIRECT on data files in Linux,
through os_file_set_nocache().

> > Regardless of whether postgres is "doing something wrong" or not,
> > do you not think that an 80% performance win would exert a certain
> > amount of pressure on distros to do the backport?
> 
> Well, I cut the previous question deliberately, but if you're going to
> force me to answer, my experience with storage tells me that one test
> being 10x different from all the others usually indicates a problem
> with the benchmark test itself rather than a baseline improvement, so
> I'd wait for more data.

... or the two benchmarks use Linux in completely different ways such
that one sees a huge benefit while the other sees none.  Which is what
you'd expect for a patchset that improves the page cache and using a
benchmark that doesn't use the page cache.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ