lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 14 May 2010 14:32:42 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Steven Rostedt <rostedt@...dmis.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Pierre Tardy <tardyp@...il.com>, Ingo Molnar <mingo@...e.hu>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Tom Zanussi <tzanussi@...il.com>,
	Paul Mackerras <paulus@...ba.org>,
	linux-kernel@...r.kernel.org, arjan@...radead.org,
	ziga.mahkovec@...il.com, davem <davem@...emloft.net>,
	linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Tejun Heo <tj@...nel.org>
Subject: [RFC] Tracer Ring Buffer splice() vs page cache [was: Re: Perf and
	ftrace [was Re: PyTimechart]]

* Peter Zijlstra (peterz@...radead.org) wrote:
> On Thu, 2010-05-13 at 12:31 -0400, Mathieu Desnoyers wrote:
> > 
> > In addition, this would play well with mmap() too: we can simply add a
> > ring_buffer_get_mmap_offset() method to the backend (exported through another
> > ioctl) that would let user-space know the start of the mmap'd buffer range
> > currently owned by the reader. So we can inform user-space of the currently
> > owned page range without even changing the underlying memory map. 
> 
> I still think keeping refs to splice pages is tricky at best. Suppose
> they're spliced into the pagecache of a file, it could stay there for a
> long time under some conditions.
> 
> Also, the splice-client (say the pagecache) and the mmap will both want
> the pageframe to contain different information.

[CCing memory management specialists]

You bring a very interesting point. Let me describe what I want to achieve, and
see what others have to say about it:

I want the ring buffer to allocate pages only at ring buffer creation (never
while tracing). There are a few reasons why I want to do that, ranging from
improved performance to limited system disturbance.

Now let suppose we have the synchronization mechanism (detailed in the original
thread, but not relevant to this part of the discussion) that lets us give the
pages to the ring buffer "reader", which sends them to splice() so it can use
them as write buffers. Let also suppose that the ring buffer reader blocks until
the pages are written to the disk (synchronous write). In my scheme, the reader
still has pointers to these pages.

The point you bring here is that when the ring buffer "reader" is woken up,
these pages could still be in the page cache. So when the reader gives these
pages back to the ring buffer (so they can be used for writing again), the page
cache may still hold a reference to them, so the pages in the page cache and the
version on disk could be unsynchronized, and therefore this could possibly lead
to trace file corruption (in the worse case).

So I have three questions here:

1 - could we enforce removal of these pages from the page cache by calling
    "page_cache_release()" before giving these pages back to the ring buffer ?

2 - or maybe is there a page flag we could specify when we allocate them to
    ask for these pages to never be put in the page cache ? (but they should be
    still usable as write buffers)

3 - is there something more we need to do to grab a reference on the pages
    before passing them to splice(), so that when we call page_cache_release()
    they don't get reclaimed ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ