lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yt9dzfwch00u.fsf@linux.ibm.com>
Date: Wed, 07 Feb 2024 14:33:21 +0100
From: Sven Schnelle <svens@...ux.ibm.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Mete Durlu <meted@...ux.ibm.com>, Masami Hiramatsu
 <mhiramat@...nel.org>,
        Mathieu Desnoyers
 <mathieu.desnoyers@...icios.com>,
        linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH] tracing: use ring_buffer_record_is_set_on() in
 tracer_tracing_is_on()

Steven Rostedt <rostedt@...dmis.org> writes:

> On Wed, 7 Feb 2024 13:07:36 +0100
> Mete Durlu <meted@...ux.ibm.com> wrote:
>
>> wouldn't the following scenario explain the behavior we are seeing.
>> When using event triggers, trace uses lockless ringbuffer control paths.
>> If cmdline update and trace output reading is happening on different
>> cpus, the ordering can get messed up.
>>
>> 1. event happens and trace trigger tells ring buffer to stop writes
>> 2. (on cpu#1)test calculates checksum on current state of trace
>>     output.
>> 3. (on cpu#2)not knowing about the trace buffers status yet, writer adds
>>     a one last entry which would collide with a pid in cmdline map before
>>     actually stopping. While (on cpu#1) checksum is being calculated, new
>>     saved cmdlines entry is waiting for spinlocks to be unlocked and then
>>     gets added.
>> 4. test calculates checksum again and finds that the trace output has
>>     changed. <...> is put on collided pid.
>
> But the failure is here:
>
> on=`cat tracing_on`
> if [ $on != "0" ]; then
>     fail "Tracing is not off"
> fi

It might be misleading because we're discussing two issues in one
thread. The failure above was one problem, which the initial fix
is for. The other issue we're still seeing is the test below:

> csum1=`md5sum trace`
> sleep $SLEEP_TIME
> csum2=`md5sum trace`
>
> if [ "$csum1" != "$csum2" ]; then
>     fail "Tracing file is still changing"
> fi
>
> 1. tracing is off
> 2. do checksum of trace
> 3. sleep
> 4. do another checksum of trace
> 5. compare the two checksums
>
> Now how did they come up differently in that amount of time? The
> saved_cmdlines really should not have been updated.

My assumption without reading the code is that something like this
happens:

CPU0                             CPU1
[ringbuffer enabled]
                                 ring_buffer_write()
                                     if (atomic_read(&buffer->record_disabled))
                                            goto out;
echo 0 > tracing_on
record_disabled |= RB_BUFFER_OFF
csum1=`md5sum trace`
                                 [adds trace entry to ring buffer,
                                  overwriting savedcmd_lines entry because
                                  it thinks ring buffer is enabled]
csum2=`md5sum trace`


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ