lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140702154439.GE1369@cmpxchg.org>
Date:	Wed, 2 Jul 2014 11:44:39 -0400
From:	Johannes Weiner <hannes@...xchg.org>
To:	Mel Gorman <mgorman@...e.de>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Linux-MM <linux-mm@...ck.org>,
	Linux-FSDevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH 0/5] Improve sequential read throughput v4r8

On Tue, Jul 01, 2014 at 05:25:38PM -0400, Johannes Weiner wrote:
> These explanations make no sense.  If pages of a streaming writer have
> enough time in memory to not thrash with a single zone, the fair
> policy should make even MORE time in memory available to them and not
> thrash them.  The fair policy is a necessity for multi-zone aging to
> make any sense and having predictable reclaim and activation behavior.
> That's why it's obviously not meant to benefit streaming workloads,
> but it shouldn't harm them, either.  Certainly not 20%.  If streaming
> pages thrash, something is up, the solution isn't to just disable the
> second zone or otherwise work around the issue.

Hey, funny story.

I tried reproducing this with an isolated tester just to be sure,
stealing tiobench's do_read_test(), but I wouldn't get any results.

I compared the original fair policy commit with its parent, I compared
a current vanilla kernel to a crude #ifdef'd policy disabling, and I
compared vanilla to your patch series - every kernel yields 132MB/s.

Then I realized, 132MB/s is the disk limit anyway - how the hell did I
get 150MB/s peak speeds for sequential cold cache IO with seqreadv4?

So I looked at the tiobench source code and it turns out, it's not
cold cache at all: it first does the write test, then the read test on
the same file!

The file is bigger than memory, so you would expect the last X percent
of the file to be cached after the seq write and the subsequent seq
read to push the tail out before getting to it - standard working set
bigger than memory behavior.

But without fairness, a chunk from the beginning of the file gets
stuck in the DMA32 zone and never pushed out while writing, so when
the reader comes along, it gets random parts from cache!

All patches that showed "major improvements" ruined fairness and led
to non-linear caching of the test file during the write, and the read
speedups came from the file being partially served from cache.

Sequential IO is fine.  This benchmark needs a whack over the head.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ