linux-kernel - Re: [PATCH RFC] hist lookups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181104201821.GA22049@krava>
Date:   Sun, 4 Nov 2018 21:18:21 +0100
From:   Jiri Olsa <jolsa@...hat.com>
To:     David Miller <davem@...emloft.net>
Cc:     acme@...nel.org, linux-kernel@...r.kernel.org, namhyung@...nel.org,
        jolsa@...nel.org
Subject: Re: [PATCH RFC] hist lookups

On Fri, Nov 02, 2018 at 11:30:03PM -0700, David Miller wrote:
> From: David Miller <davem@...emloft.net>
> Date: Wed, 31 Oct 2018 09:08:16 -0700 (PDT)
> 
> > From: Jiri Olsa <jolsa@...hat.com>
> > Date: Wed, 31 Oct 2018 16:39:07 +0100
> > 
> >> it'd be great to make hist processing faster, but is your main target here
> >> to get the load out of the reader thread, so we dont lose events during the
> >> hist processing?
> >> 
> >> we could queue events directly from reader thread into another thread and
> >> keep it (the reader thread) free of processing, focusing only on event
> >> reading/passing 
> > 
> > Indeed, we could create threads that take samples from the thread processing
> > the ring buffers, and insert them into the histogram.
> 
> So I played around with some ideas like this and ran into some dead ends.
> 
> I ran each mmap ring's processing in a separate thread.
> 
> This doesn't help at all, the problem is that all the threads serialize
> at the pthread lock for the histogram part of the work.
> 
> And the histogram part dominates the cost of processing each sample.

yep, it suck.. I was thinking of keeping separate hist objects for
each thread and merge them at the end

> 
> Nevertheless I started work on formally threading all of the code that
> the mmap threads operate on, such as symbol processing etc. and while
> doing so I came to the conclusion that pushing the histogram processing
> only to a separate thread poses it's own set of big challenges.
> 
> To make this work we would have to make a piece of transient on-stack
> state (the processed event) into allocated persistent state.
> 
> These persistent event structures get queued up to the histogram
> thread(s).
> 
> Therefore, if the histogram thread(s) can't keep up (and as per my
> experiment above, it is easy to enter this state because the histogram
> code itself is going to run linearly with the histgram lock held),
> this persistent event memory will just get larger and larger.
> 
> We would have to find some way to parallelize the histgram code to
> make any kind of threading worthwhile.

do you have some code I could check on?

I'm going to make that separate thread to get the processing out
of the reading thread.. I think we need that in any case, so the
ring buffer is kept free as fast as possible

thanks,
jirka