linux-kernel - Re: [PATCH] tracing: use ring_buffer_record_is_set_on() in tracer_tracing_is

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9a062196-ccbe-440e-a2f9-23eb8c5eb837@linux.ibm.com>
Date: Wed, 7 Feb 2024 13:07:36 +0100
From: Mete Durlu <meted@...ux.ibm.com>
To: Steven Rostedt <rostedt@...dmis.org>, Sven Schnelle <svens@...ux.ibm.com>
Cc: Masami Hiramatsu <mhiramat@...nel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH] tracing: use ring_buffer_record_is_set_on() in
 tracer_tracing_is_on()

On 2/7/24 12:09, Steven Rostedt wrote:
> On Wed, 07 Feb 2024 06:50:59 +0100
> Sven Schnelle <svens@...ux.ibm.com> wrote:
> 
>> Hi Steven,
> 
>> Not sure whether that is enough, have to test. However, it's not really
>> a fix, it would just require a bit more load and the test would fail
>> again. The fundamental problem is that even after disabling tracing
>> there might be some tracing line added due to the lockless nature of
>> the ringbuffer. This might then replace some existing cmdline entry.
>> So maybe we should change the test to ignore the program name when
>> calculating the checksums.
> 
> Have you figured out what caused the cmdlines to change when tracing is
> off. It shouldn't be updated even with just 128 entries.
> 
> I'm also looking into a way to keep more of a LRU command lines around,
> but nothing concrete yet.
> 
> -- Steve

Hi,

wouldn't the following scenario explain the behavior we are seeing.
When using event triggers, trace uses lockless ringbuffer control paths.
If cmdline update and trace output reading is happening on different
cpus, the ordering can get messed up.

1. event happens and trace trigger tells ring buffer to stop writes
2. (on cpu#1)test calculates checksum on current state of trace
    output.
3. (on cpu#2)not knowing about the trace buffers status yet, writer adds
    a one last entry which would collide with a pid in cmdline map before
    actually stopping. While (on cpu#1) checksum is being calculated, new
    saved cmdlines entry is waiting for spinlocks to be unlocked and then
    gets added.
4. test calculates checksum again and finds that the trace output has
    changed. <...> is put on collided pid.