lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 02 Nov 2018 23:30:03 -0700 (PDT)
From:   David Miller <davem@...emloft.net>
To:     jolsa@...hat.com
Cc:     acme@...nel.org, linux-kernel@...r.kernel.org, namhyung@...nel.org,
        jolsa@...nel.org
Subject: Re: [PATCH RFC] hist lookups

From: David Miller <davem@...emloft.net>
Date: Wed, 31 Oct 2018 09:08:16 -0700 (PDT)

> From: Jiri Olsa <jolsa@...hat.com>
> Date: Wed, 31 Oct 2018 16:39:07 +0100
> 
>> it'd be great to make hist processing faster, but is your main target here
>> to get the load out of the reader thread, so we dont lose events during the
>> hist processing?
>> 
>> we could queue events directly from reader thread into another thread and
>> keep it (the reader thread) free of processing, focusing only on event
>> reading/passing 
> 
> Indeed, we could create threads that take samples from the thread processing
> the ring buffers, and insert them into the histogram.

So I played around with some ideas like this and ran into some dead ends.

I ran each mmap ring's processing in a separate thread.

This doesn't help at all, the problem is that all the threads serialize
at the pthread lock for the histogram part of the work.

And the histogram part dominates the cost of processing each sample.

Nevertheless I started work on formally threading all of the code that
the mmap threads operate on, such as symbol processing etc. and while
doing so I came to the conclusion that pushing the histogram processing
only to a separate thread poses it's own set of big challenges.

To make this work we would have to make a piece of transient on-stack
state (the processed event) into allocated persistent state.

These persistent event structures get queued up to the histogram
thread(s).

Therefore, if the histogram thread(s) can't keep up (and as per my
experiment above, it is easy to enter this state because the histogram
code itself is going to run linearly with the histgram lock held),
this persistent event memory will just get larger and larger.

We would have to find some way to parallelize the histgram code to
make any kind of threading worthwhile.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ