linux-kernel - Re: [PATCH 4/5] trace: Make removal of ring buffer pages atomic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAL26m8K8_55r_KZ9hc1ZPxvFTnnRzmjB0RH62OUB=Be0FjpcjQ@mail.gmail.com>
Date:	Fri, 29 Jul 2011 16:30:20 -0700
From:	Vaibhav Nagarnaik <vnagarnaik@...gle.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Frederic Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...hat.com>,
	Michael Rubin <mrubin@...gle.com>,
	David Sharp <dhsharp@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/5] trace: Make removal of ring buffer pages atomic

On Fri, Jul 29, 2011 at 2:23 PM, Steven Rostedt <rostedt@...dmis.org> wrote:
> On Tue, 2011-07-26 at 15:59 -0700, Vaibhav Nagarnaik wrote:
>
>> +
>> +             } else if (((unsigned long)to_remove & ~RB_PAGE_HEAD) ==
>> +                                     (unsigned long)to_remove) {
>> +
>> +                     /* not a head page, just update the next pointer */
>> +                     ret = cmpxchg(&tail_page->next, to_remove, next_page);
>
> This is not, it wont work.
>
> You can *only* remove the HEAD from the ring buffer without causing
> issues.
>
> As you probably know, the trick is done with the list pointers. We or
> the pointer with 1 for head, and the writer will or it with 2 when it
> updates the page.
>
> This only works if we have a 1 or 2 here. Now if we try to do what you
> suggest, by starting with a 0, and ending with 0, we may fail. Between
> the  to_remove = tail_page->next and the cmpxchg(), the writer could
> easily move to the tail page, and you would never know it.
>
> Now we just removed the tail page with no idea that the write is on it.
> The writer could have also moved on to the next page, and we just
> removed the most recently recorded data.
>
> The only way to really make this work is to always get it from the HEAD
> page. If there's data there, we could just store it separately, so that
> the read_page can read from it first. We will still need to be careful
> with the writer on the page. But I think this is doable.
>
> That is, read the pages from head, if there's no data on it, simply
> remove the pages. If there is data, we store it off later. If the writer
> happens to be on the page, we will can check that. We could even
> continue to get pages, because we will be moving the header page with
> the cmpxchg, and the writer does that too. It will take some serious
> thought, but it is possible to do this.
>
> -- Steve

There should only be IRQs and NMIs that preempt this operation since
the removal operation of a cpu ring buffer is scheduled on keventd of
the same CPU. But you're right there is a race between reading the
to_remove pointer and cmpxchg() operation.

While we are trying to remove the head page, the writer could move to
the head page. Additionally, we will be adding complexity to manage data
from all the removed pages for read_page.

I discussed with David and here are some ways we thought to address
this:
1. After the cmpxchg(), if we see that the tail page has moved to
   to_remove page, then revert the cmpxchg() operation and try with the
   next page. This might add some more complexity and doesn't work with
   an interrupt storm coming in.
2. Disable/enable IRQs while removing pages. This won't stop traced NMIs
   though and we are now affecting the system behavior.
3. David didn't like this, but we could increment
   cpu_buffer->record_disabled to prevent writer from moving any pages
   for the duration of this process. If we combine this with disabling
   preemption, we would be losing traces from an IRQ/NMI context, but we
   would be safe from races while this operation is going on.

The reason we want to remove the pages after tail is to give priority to
empty pages first before touching any data pages. Also according to your
suggestion, I am not sure how to manage the data pages once they are
removed, since they cannot be freed and the reader might not be present
which will make the pages stay resident, a form of memory leak.

Vaibhav Nagarnaik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/