lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 29 Jul 2011 21:12:17 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Vaibhav Nagarnaik <vnagarnaik@...gle.com>
Cc:	Frederic Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...hat.com>,
	Michael Rubin <mrubin@...gle.com>,
	David Sharp <dhsharp@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/5] trace: Make removal of ring buffer pages atomic

On Fri, 2011-07-29 at 16:30 -0700, Vaibhav Nagarnaik wrote:
> On Fri, Jul 29, 2011 at 2:23 PM, Steven Rostedt <rostedt@...dmis.org> wrote:

> There should only be IRQs and NMIs that preempt this operation since
> the removal operation of a cpu ring buffer is scheduled on keventd of
> the same CPU. But you're right there is a race between reading the
> to_remove pointer and cmpxchg() operation.

Bah, this is what I get for reviewing patches and doing other work at
the same time. I saw the work/completion set up, but it didn't register
to me that this was calling schedule_work_on(cpu..). 

But that said, I'm not sure I really like that. This still seems a bit
too complex.

> 
> While we are trying to remove the head page, the writer could move to
> the head page. Additionally, we will be adding complexity to manage data
> from all the removed pages for read_page.
> 
> I discussed with David and here are some ways we thought to address
> this:
> 1. After the cmpxchg(), if we see that the tail page has moved to
>    to_remove page, then revert the cmpxchg() operation and try with the
>    next page. This might add some more complexity and doesn't work with
>    an interrupt storm coming in.

Egad no. That will just make things more complex, and harder to verify
is correct.

> 2. Disable/enable IRQs while removing pages. This won't stop traced NMIs
>    though and we are now affecting the system behavior.
> 3. David didn't like this, but we could increment
>    cpu_buffer->record_disabled to prevent writer from moving any pages
>    for the duration of this process. If we combine this with disabling
>    preemption, we would be losing traces from an IRQ/NMI context, but we
>    would be safe from races while this operation is going on.
> 
> The reason we want to remove the pages after tail is to give priority to
> empty pages first before touching any data pages. Also according to your
> suggestion, I am not sure how to manage the data pages once they are
> removed, since they cannot be freed and the reader might not be present
> which will make the pages stay resident, a form of memory leak.

They will be freed when they are eventually read. Right, if there's no
reader, then they will not be freed, but that isn't really a true memory
leak. It is basically just like we didn't remove the pages, but I do not
consider this a memory leak. The pages are just waiting to be reclaimed,
and will be freed on any reset of the ring buffer.

Anyway, the choices are:

* Remove from the HEAD and use the existing algorithm that we've been
using since 2008. This requires a bit of accounting on the reader side,
but nothing too complex.

Pros: Should not have any major race conditions. Requires no
schedule_work_on() calls. Uses existing algorithm

Cons: Can keep pages around if no reader is present, and ring buffer is
not reset.

* Read from tail. Modify the already complex but tried and true lockless
algorithm.

Pros: Removes empty pages first.

Cons: Adds a lot more complexity to a complex system that has been
working since 2008.


The above makes me lean towards just taking from HEAD.

If you are worried about leaked pages, we could even have a debugfs file
that lets us monitor the pages that are pending read, and have the user
(or application) be able to flush them if they see the ring buffer is
full anyway.

-- Steve



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ