lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 12 Feb 2021 12:16:08 +0000
From:   <Viktor.Rosendahl@....de>
To:     <rostedt@...dmis.org>
CC:     <mingo@...hat.com>, <joel@...lfernandes.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] Add the latency-collector to tools

On Thu, 2021-02-11 at 14:56 -0500, Steven Rostedt wrote:
> On Thu, 11 Feb 2021 17:17:42 +0100
> Viktor Rosendahl <Viktor.Rosendahl@....de> wrote:
> 
> > It seems to work but I discovered during testing that it seems like newer
> > kernels have a tendency to lose some latencies in longer bursts. I guess
> > this
> > is likely to be another regression in the preemptirqsoff tracer.
> 
> Not sure what you mean by the above. I'd be interested in fixing it if is
> really a problem.


Sorry, my bad; I should have been more specific.

Yesterday, when I tested the latency-collector, I could see that there were
clear signs of missing latencies. I used preemptirq_delay_test to generate
bursts of 10 and never got any stack traces from the 10th latency (nor the 9th
or 8th). 

I think I used 2000us as a threshold and generated latencies of 3000us.

Also, sometimes, I could see that there were not enough (meaning 9) messages
like "printout skipped due to random delay" before a "randomly sleep for 1000 ms
before print", and then the stack trace would point to a latency from the
beginning of the burst, and not the last, as would be expected if the latency-
collector slept for 1000 ms.

I could see the problem also without the latency-collector, because if I
generated a burst of 10 and then checked /sys/kernel/tracing/trace, I would not
see the the stack trace from the 10th latency but from one of the earlier ones,
that is, I would for example see the preemptirqtest_2 function in the trace,
instead of preemptirqtest_9, as expected. IIRC, I sometimes saw
the reemptirqtest_0 function, indicating that only the first was captured and
the rest were lost.

However, for some reason I cannot reproduce the behavior now, allthough I use
exactly the same kernel.

Because humans are more often at fault than computers, I cannot deny the
possibility that I would have misconfigured something and it was all the result
of a faulty test.

I will let you know if come up with a way of reproducing this behavior later. I
cannot spend more time on it right now.

> 
> 
> 
> I didn't look too deeply at the rest, just skimmed it basically, and I
> tried it out.
> 
> I'm fine with pulling this into my queue, as it's just a tool and wont
> cause any other issues. I can move some of the files in scripts that deals
> with tracing into the tools/tracing directory too. Maybe this should be
> placed in a sub directory? tools/tracing/latency/ ?
> 
> Feel free to submit a proper patch (proper change log, etc).
> 

Ok, thanks, I think that I have incorporated all of your suggestions in the
patch that I already sent out earlier today.

best regards,

Viktor

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ