linux-kernel - Re: Improve lseek scalability v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201109161927.34472.andres@anarazel.de>
Date:	Fri, 16 Sep 2011 19:27:33 +0200
From:	Andres Freund <andres@...razel.de>
To:	Matthew Wilcox <matthew@....cx>
Cc:	Andi Kleen <andi@...stfloor.org>, viro@...iv.linux.org.uk,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	robertmhaas@...il.com, pgsql-hackers@...tgresql.org
Subject: Re: Improve lseek scalability v3

Hi,
On Friday 16 Sep 2011 17:36:20 Matthew Wilcox wrote:
> On Fri, Sep 16, 2011 at 04:16:49PM +0200, Andres Freund wrote:
> > I sent an email containing benchmarks from Robert Haas regarding the
> > Subject. Looking at lkml.org I can't see it right now, Will recheck when
> > I am at home.
> > 
> > He replaced lseek(SEEK_END) with fstat() and got speedups up to 8.7 times
> > the lseek performance.
> > The workload was 64 clients hammering postgres with a simple readonly
> > workload (pgbench -S).
> Yay!  Data!

> > For reference see the thread in the postgres archives which also links to
> > performance data: http://archives.postgresql.org/message-
> > id/CA+TgmoawRfpan35wzvgHkSJ0+i-W=VkJpKnRxK2kTDR+HsanWA@...l.gmail.com
> So both fstat and lseek do more work than postgres wants.  lseek modifies
> the file pointer while fstat copies all kinds of unnecessary information
> into userspace.  I imagine this is the source of the slowdown seen in
> the 1-client case.
Yes, that was my theory as well.

> I'd like to dig into the requirement for knowing the file size a little
> better.  According to the blog entry it's used for "the query planner".
Its used for multiple things - one of which is the query planner.
The query planner needs to know how many tuples a table has to produce a 
sensible plan. For that is has stats which tell 1. how big is the table 2. how 
many tuples does the table have. Those statistics are only updated every now 
and then though.
So it uses those old stats to check how many tuples are normally stored on a 
page and then uses that to extrapolate the number of tuples from the current 
nr of pages (which is computed by lseek(SEEK_END) over the 1GB segements of a 
table).

I am not sure how interested you are on the relevant postgres internals?

> Does the query planner need to know the exact number of bytes in the file,
> or is it after an order-of-magnitude?  Or to-the-nearest-gigabyte?
It depends on where the information is used. For some of the uses it needs to 
be exact (the assumed size is rechecked after acquiring a lock preventing 
extension) at other places I guess it would be ok if the accuracy got lower 
with bigger files (those files won't ever get bigger than 1GB).
But I have a hard time seeing an implementation where the approximate size 
would be faster to get than just the filesize? 

Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/