linux-kernel - Re: Improve lseek scalability v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM-w4HNAJVRx5Vj87hXjL9JDjwbUoDiso_NZcfomk7wpd2zshw@mail.gmail.com>
Date:	Fri, 16 Sep 2011 23:44:59 +0100
From:	Greg Stark <stark@....edu>
To:	Benjamin LaHaise <bcrl@...ck.org>
Cc:	Andres Freund <andres@...razel.de>,
	Matthew Wilcox <matthew@....cx>,
	Andi Kleen <andi@...stfloor.org>, viro@...iv.linux.org.uk,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	robertmhaas@...il.com, pgsql-hackers@...tgresql.org
Subject: Re: Improve lseek scalability v3

On Fri, Sep 16, 2011 at 9:08 PM, Benjamin LaHaise <bcrl@...ck.org> wrote:
> For such tables, can't Postgres track the size of the file internally?  I'm
> assuming it's keeping file descriptors open on the tables it manages, in
> which case when it writes to a file to extend it, the internally stored size
> could be updated.  Not making a syscall at all would scale far better than
> even a modified lseek() will perform.

There's no hardwired limit on how many tables you can have in a
database, it's not limited by the number of file descriptors. Postgres
would have to keep some kind of LRU for recently opened files and
their sizes or something like that. There would probably still be a
lot of lseeks/fstats going on.

Generally keeping a Postgres cached value for the size would then have
a reliability issue. It's much safer to have a single authoritative
value -- the actual length of the file -- than have the same value
stored in two locations and then need to worry about them getting out
of sync. If a write fails when extending the file due to a filesystem
running out of space then Postgres might not know how to update its
internal cached state accurately for example.

There's no question it could be done but it's not clear it would
necessarily be much faster than a lock-free lseek/fstat.

On Fri, Sep 16, 2011 at 6:27 PM, Andres Freund <andres@...razel.de> wrote:
> It depends on where the information is used. For some of the uses it needs to
> be exact (the assumed size is rechecked after acquiring a lock preventing
> extension)

Fwiw this might give the wrong impression. I don't believe scans
acquire a lock preventing extension -- that is another process can be
concurrently extending the file at the same time as the scan is
proceeding. The scan only locks out truncation (vacuum). Any blocks
added by another process are ignored by the scan because they can only
contain records invisible to that transaction. This does depend on the
lseek/fstat being done after the transaction snapshot is taken which
is possibly "rechecking" the value taken by the query planner but
they're really two independent things.

-- 
greg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/