lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 2 Jul 2013 20:55:14 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	Rob van der Heij <rvdheij@...il.com>, Mel Gorman <mgorman@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Yannick Brosseau <yannick.brosseau@...il.com>,
	stable@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
	"lttng-dev@...ts.lttng.org" <lttng-dev@...ts.lttng.org>
Subject: Re: [-stable 3.8.1 performance regression] madvise
	POSIX_FADV_DONTNEED

* Mathieu Desnoyers (mathieu.desnoyers@...icios.com) wrote:
> * Dave Chinner (david@...morbit.com) wrote:
> > On Thu, Jun 20, 2013 at 08:20:16AM -0400, Mathieu Desnoyers wrote:
> > > * Rob van der Heij (rvdheij@...il.com) wrote:
> > > > Wouldn't you batch the calls to drop the pages from cache rather than drop
> > > > one packet at a time?
> > > 
> > > By default for kernel tracing, lttng's trace packets are 1MB, so I
> > > consider the call to fadvise to be already batched by applying it to 1MB
> > > packets rather than indivitual pages. Even there, it seems that the
> > > extra overhead added by the lru drain on each CPU is noticeable.
> > > 
> > > Another reason for not batching this in larger chunks is to limit the
> > > impact of the tracer on the kernel page cache. LTTng limits itself to
> > > its own set of buffers, and use the page cache for what is absolutely
> > > needed to perform I/O, but no more.
> > 
> > I think you are doing it wrong. This is a poster child case for
> > using Direct IO and completely avoiding the page cache altogether....
> 
> I just tried replacing my sync_file_range()+fadvise() calls and instead
> pass the O_DIRECT flag to open(). Unfortunately, I must be doing
> something very wrong, because I get only 1/3rd of the throughput, and
> the page cache fills up. Any idea why ?

Since O_DIRECT does not seem to provide acceptable throughput, it may be
interesting to investigate other ways to lessen the latency impact of
the fadvise DONTNEED hint.

Given it is just a hint, we should be allowed to perform page
deactivation lazily. Is there any fundamental reason to wait for worker
threads on each CPU to complete their lru drain before returning from
fadvise() to user-space ?

Thanks,

Mathieu

> 
> Here are my results:
> 
> heavy-syscall.c: 30M sigaction() syscall with bad parameters (returns
> immediately). Used as high-throughput stress-test for the tracer.
> Tracing to disk with LTTng, all kernel tracepoints activated, including
> system calls.
> 
> Tracer configuration: per-core buffers split into 4 sub-buffers of
> 262kB. splice() is used to transfer data from buffers to disk. Runs on a
> 8-core Intel machine.
> 
> Writing to a software raid-1 ext3 partition.
> ext3 mount options: rw,errors=remount-ro
> 
> * sync_file_range+fadvise 3.9.8
>   - with lru drain on fadvise
> 
> Kernel cache usage:
> Before tracing: 56272k cached
> After tracing:  56388k cached
> 
> 939M	/root/lttng-traces/auto-20130702-090430
> time ./heavy-syscall 
> real	0m21.910s
> throughput: 42MB/s
> 
> 
> * sync_file_range+fadvise 3.9.8
>   - without lru drain on fadvise: manually reverted
> 
> Kernel cache usage:
> Before tracing: 67968k cached
> After tracing:  67984k cached
> 
> 945M	/root/lttng-traces/auto-20130702-092505
> time ./heavy-syscall 
> real	0m21.872s
> throughput: 43MB/s
> 
> 
> * O_DIRECT 3.9.8
>   - O_DIRECT flag on open(), removed fadvise and sync_file_range calls
> 
> Kernel cache usage:
> Before tracing:  99480k cached
> After tracing:  360132k cached
> 
> 258M	/root/lttng-traces/auto-20130702-090603
> time ./heavy-syscall 
> real	0m19.627s
> throughput: 13MB/s
> 
> 
> * No cache hints 3.9.8
>   - only removed fadvise and sync_file_range calls
> 
> Kernel cache usage:
> Before tracing: 103556k cached
> After tracing:  363712k cached
> 
> 945M	/root/lttng-traces/auto-20130702-092505
> time ./heavy-syscall 
> real	0m19.672s
> throughput: 48MB/s
> 
> Thoughts ?
> 
> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ