linux-kernel - Re: [PATCH RFC] hist lookups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 02 Nov 2018 23:30:03 -0700 (PDT)
From:   David Miller <davem@...emloft.net>
To:     jolsa@...hat.com
Cc:     acme@...nel.org, linux-kernel@...r.kernel.org, namhyung@...nel.org,
        jolsa@...nel.org
Subject: Re: [PATCH RFC] hist lookups

From: David Miller <davem@...emloft.net>
Date: Wed, 31 Oct 2018 09:08:16 -0700 (PDT)

> From: Jiri Olsa <jolsa@...hat.com>
> Date: Wed, 31 Oct 2018 16:39:07 +0100
> 
>> it'd be great to make hist processing faster, but is your main target here
>> to get the load out of the reader thread, so we dont lose events during the
>> hist processing?
>> 
>> we could queue events directly from reader thread into another thread and
>> keep it (the reader thread) free of processing, focusing only on event
>> reading/passing 
> 
> Indeed, we could create threads that take samples from the thread processing
> the ring buffers, and insert them into the histogram.

So I played around with some ideas like this and ran into some dead ends.

I ran each mmap ring's processing in a separate thread.

This doesn't help at all, the problem is that all the threads serialize
at the pthread lock for the histogram part of the work.

And the histogram part dominates the cost of processing each sample.

Nevertheless I started work on formally threading all of the code that
the mmap threads operate on, such as symbol processing etc. and while
doing so I came to the conclusion that pushing the histogram processing
only to a separate thread poses it's own set of big challenges.

To make this work we would have to make a piece of transient on-stack
state (the processed event) into allocated persistent state.

These persistent event structures get queued up to the histogram
thread(s).

Therefore, if the histogram thread(s) can't keep up (and as per my
experiment above, it is easy to enter this state because the histogram
code itself is going to run linearly with the histgram lock held),
this persistent event memory will just get larger and larger.

We would have to find some way to parallelize the histgram code to
make any kind of threading worthwhile.