linux-kernel - Re: [PATCH RFC] hist lookups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20181031.090816.2117345408719881030.davem@davemloft.net>
Date:   Wed, 31 Oct 2018 09:08:16 -0700 (PDT)
From:   David Miller <davem@...emloft.net>
To:     jolsa@...hat.com
Cc:     acme@...nel.org, linux-kernel@...r.kernel.org, namhyung@...nel.org,
        jolsa@...nel.org
Subject: Re: [PATCH RFC] hist lookups

From: Jiri Olsa <jolsa@...hat.com>
Date: Wed, 31 Oct 2018 16:39:07 +0100

> it'd be great to make hist processing faster, but is your main target here
> to get the load out of the reader thread, so we dont lose events during the
> hist processing?
> 
> we could queue events directly from reader thread into another thread and
> keep it (the reader thread) free of processing, focusing only on event
> reading/passing 

Indeed, we could create threads that take samples from the thread processing
the ring buffers, and insert them into the histogram.

In fact, since there is pthread locking already around the histogram
datastructures we could parallelize that as much as we want.

If beneficial we could also parallelize the ring buffer processing
into a small number of threads too.

My understanding is that in it's default mode perf gets one event ring
buffer per cpu being analyzed.  So we could divide that number of
rings by some factor, like 16 or something, and thus divide the rings
into groups of 16 with one thread assigned to each group.

There is one major concern about this though.  Creating threads makes
perf a bit more "invasive" to the workload it is observing.  And that
is something we've always worked to minimize.

I think your idea to add threads for the histogram work is great.

But I still think that the histogram code is really bloated, and doing
a full 262 byte memset on every histogram lookup is unnecessary
overhead.