linux-kernel - Re: [PATCH] trace-cmd: use nonblocking reads for streaming

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <56CDF609.7010506@fb.com>
Date:	Wed, 24 Feb 2016 13:27:21 -0500
From:	Josef Bacik <jbacik@...com>
To:	Steven Rostedt <rostedt@...dmis.org>
CC:	<linux-kernel@...r.kernel.org>, <kernel-team@...com>
Subject: Re: [PATCH] trace-cmd: use nonblocking reads for streaming

On 02/23/2016 06:17 PM, Steven Rostedt wrote:
> On Thu, 17 Dec 2015 12:01:52 -0500
> Josef Bacik <jbacik@...com> wrote:
>
>> I noticed while using the streaming infrastructure in trace-cmd that I was
>> seemingly missing events.  Using other tracing methods I got these events and
>> record->missed_events was never being set.  This is because the streaming
>> infrastructure uses blocking reads on the per cpu trace pipe's, which means
>> we'll wait for an entire pages worth of data to be ready before passing it along
>> to the recorder.  This makes it impossible to do long term tracing that requires
>> coupling two different events that could occur on different CPU's, and I imagine
>> it has been what is screwing up my trace-cmd profile runs on our giant 40 cpu
>> boxes.  Fix trace-cmd instead to use a nonblocking read with select to wait for
>> data on the pipe so we don't burn CPU unnecessarily.  With this patch I'm no
>> longer seeing missed events in my app.  Thanks,
>
> I just want to make sure I understand what is happening here.
>
> This wasn't trace-cmd's default code right? This was your own app. And
> I'm guessing you were matching events perhaps. That is, after seeing
> some event, you looked for the other event. But if that event happened
> on a CPU that isn't very active, it would wait forever, as the read
> was waiting for a full page?
>
> Or is there something else.
>
> I don't have a problem with the patch. I just want to understand the
> issue.

Yup I had an app that was watching block request issue and completion 
events, and occasionally a completion event would happen on some mostly 
idle cpu, so I wouldn't get the completion request until several hours 
later (the app runs all the time) when we finally had a full page to 
read from that cpu's buffer.  Thanks,

Josef