linux-kernel - Re: [PATCH 1/2] tracing: Simplify and fix "buffered event" synchronization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d404113e-4ffe-4e9c-ab45-1b076c1f498c@suse.com>
Date:   Fri, 1 Dec 2023 15:17:35 +0100
From:   Petr Pavlu <petr.pavlu@...e.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     mhiramat@...nel.org, mathieu.desnoyers@...icios.com,
        zhengyejian1@...wei.com, linux-trace-kernel@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] tracing: Simplify and fix "buffered event"
 synchronization

On 11/29/23 15:58, Steven Rostedt wrote:
> On Wed, 29 Nov 2023 10:22:23 +0100
> Petr Pavlu <petr.pavlu@...e.com> wrote:
> 
>> Yes, I believe this should address this potential race condition.
>>
>> An alternative would be instead to update
>> trace_event_buffer_lock_reserve() as follows:
>>
>> 	if (this_cpu_inc_return(trace_buffered_event_cnt) == 1) {
>> 		smp_rmb();
> 
> This is the problem I have with your approach. That smp_rmb() is in the
> highly critical path. On some architectures, this has a significant impact
> on the overhead of this code. This path is called during code execution and
> increases the overhead of the tracing infrastructure.
> 
> If I'm given two solutions where one adds a smp_rmb() to the critical path
> and the other just slows down the non-critical path more, I'll take the
> slow down of non-critical path every time.
> 
>> 		if ((entry = __this_cpu_read(trace_buffered_event))) {
>> 			[...]
>>
>> That saves the synchronize_rcu() call but additionally modifies
>> trace_buffered_event_cnt even if trace_buffered_event is currently NULL.
>>
>> Another alternative is the approach taken by my patch which avoids more
>> RCU work and unnecessary memory modifications.
>>
>> I'd be interested if you could have a look again at what I'm proposing
>> in my patch. I think it simplifies the code while addressing these
>> problems as well. However, if you have reservations about that approach
>> then it is ok, I can fix the found problems individually as discussed.
> 
> Fix this without adding any memory barriers to the critical path, then I'll
> take another look.
> 
> FYI, this code was designed in the first place to avoid adding memory
> barriers in the critical path.

Thank you for the explanation. I ran the tests you mentioned in the
description of commit 0fc1b09ff1ff ("tracing: Use temp buffer when
filtering events") to understand this aspect a bit more. I confirm that
my proposed patch makes the tracing slower, usually by single digit
percentages. I understand this is not welcome. I also realize that the
slowdown might be even worse in different situations and on other
architectures that I checked (arm64, x86_64).

Ok, keeping the current approach, my plan for v2 is to prepare the
following patches:

* Fix for the missing increment+decrement of trace_buffered_event_cnt
  on the current CPU in trace_buffered_event_disable().

  Replace smp_call_function_many() with on_each_cpu_mask() in
  trace_buffered_event_disable(). The on_each_cpu_mask() function has
  also an advantage that it itself disables preemption so doing that can
  be then removed from trace_buffered_event_disable().

* Fix the potential race between trace_buffered_event_enable() and
  trace_event_buffer_lock_reserve() where the latter might already see
  a valid trace_buffered_event pointer but not all initialization yet.

  I think this might be actually best to address by using the same
  maintenance exclusion as is implemented in
  trace_buffered_event_disable(). It would make both maintenance
  operations consistent but for the cost of making the enable operation
  somewhat slower.

* Fix the WARN_ON_ONCE(!trace_buffered_event_ref) issued in
  trace_buffered_event_disable() when trace_buffered_event_enable()
  previously fails.

  Add a variable/flag tracking whether trace_buffered_event is currently
  allocated and use that for driving if a new allocation needs to be
  done when trace_buffered_event_enable() is called, or the buffers
  should be really freed when trace_buffered_event_disable() is invoked.

  Not sure if the mentioned alternative of leaving trace_buffered_event
  partially initialized on failure is preferred instead.

* Fix the potential race between trace_buffered_event_disable() and
  trace_event_buffer_lock_reserve() where the latter might still grab
  a pointer from trace_buffered_event that is being freed.

  Replace smp_wmb() with synchronize_rcu() in
  trace_buffered_event_disable().

Thanks,
Petr