linux-kernel - Re: ftrace global trace_pipe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5af4b871-41dd-219b-f78e-3a60ea570160@gliwa.com>
Date:   Wed, 16 Jan 2019 09:00:00 +0100
From:   Claudio <claudio.fontana@...wa.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org
Subject: Re: ftrace global trace_pipe_raw

Hi Steven, happy new year,

On 12/19/2018 05:37 PM, Steven Rostedt wrote:
> On Wed, 19 Dec 2018 12:32:41 +0100
> Claudio <claudio.fontana@...wa.com> wrote:
> 
>>>>
>>>> I would imagine the core functionality is already available, since trace_pipe
>>>> in the tracing directory already shows all events regardless of CPU, and so
>>>> it would be a matter of doing the same for trace_pipe_raw.  
>>>
>>> The difference between trace_pipe and trace_pipe_raw is that trace_pipe
>>> is post processed, and reads the per CPU buffers and interleaves them
>>> one event at a time. The trace_pipe_raw just sends you the raw
>>> unprocessed data directly from the buffers, which are grouped per CPU.  
>>
>> I think that what I am looking for, to improve the performance of our system,
>> is a post processed stream of binary entry data, already merged from all CPUs
>> and sorted per timestamp, in the same way that it is done for textual output
>> in __find_next_entry:
>>
>>        for_each_tracing_cpu(cpu) {
>>
>>                 if (ring_buffer_empty_cpu(buffer, cpu))
>>                         continue;
>>
>>                 ent = peek_next_entry(iter, cpu, &ts, &lost_events);
>>
>>                 /*                                                                
>>                  * Pick the entry with the smallest timestamp:                    
>>                  */
>>                 if (ent && (!next || ts < next_ts)) {
>>                         next = ent;
>>                         next_cpu = cpu;
>>                         next_ts = ts;
>>                         next_lost = lost_events;
>>                         next_size = iter->ent_size;
>>                 }
>>         }
>>
>> We first tried to use the textual output directly, but this lead to
>> unacceptable overheads in parsing the text.
>>
>> Please correct me if I do not understand, however it seems to me that it
>> would be possible do the same kind of post processing including generating
>> a sorted stream of entries, just avoiding the text output formatting,
>> and outputting the binary data of the entry directly, which would be way
>> more efficient to consume directly from user space correlators.
>>
>> But maybe this is not a general enough requirement to be acceptable for
>> implementing directly into the kernel?
>>
>> We have the requirement of using the OS tracing events, including
>> scheduling events, to react from software immediately
>> (vs doing after-the-fact analysis).
> 
> Have you looked at using the perf event interface? I believe it uses a
> single buffer for all events. At least for tracing a single process.
> 
> -- Steve
> 

Indeed the perf event interface would be awesome, if only it would support tracing all processes.

Unfortunately for my use case, it can only trace one process on any cpus, or all processes on one (1) cpu.

I guess for some kind of security concerns..

I'll take a look at how much work it would be to extend the interface for the any process/any cpu use case.

Ciao and thank you,

Claudio