linux-kernel - Re: [PATCH RFC] hist lookups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20181106.221349.1296959035550004994.davem@davemloft.net>
Date:   Tue, 06 Nov 2018 22:13:49 -0800 (PST)
From:   David Miller <davem@...emloft.net>
To:     jolsa@...hat.com
Cc:     acme@...nel.org, linux-kernel@...r.kernel.org, namhyung@...nel.org,
        jolsa@...nel.org
Subject: Re: [PATCH RFC] hist lookups

From: Jiri Olsa <jolsa@...hat.com>
Date: Tue, 6 Nov 2018 21:42:55 +0100

> I pushed that fix in perf/fixes branch, but I'm still occasionaly
> hitting the namespace crash.. working on it ;-)

Jiri, how can this new scheme work without setting copy_on_queue
for the queued_events we use here?

I don't see copy_on_queue being set and that means the queued event
structures reference the event memory directly in the mmaps, after the
mmap thread has released them back to the queue.

That means new events can come in to the mmap ring and overwrite what
was there previously, maybe even while deliver_event() is in the
middle of parsing the event.

Setting copy_on_queue for data[0] and data[1] makes all of the crashes
go away for me.

I get a lot of "[unknown]" shared objects shortly after perf top
starts up during a full workload.  I've been wondering about one
side effect of how the mmap queues are processed, consider the
following:

	cpu 0			cpu 1

				exec
				create new mmap2 events
				scheduled to cpu 0 for whatever reason
	sample 1
	sample 2

And let's say that perf top is backlogged processing the mmap ring of
events generated for cpu 0, and sees sample 1 and sample 2 before
getting to any of cpu 1's events.

This means the thread and map and symbol objects won't exist and
we'll get those '[Unknown]' histogram entries, and they won't go
away.

When it finally stops looping over the mmap ring for cpu 0's events
it gets to cpu 1's mmap ring and sees the exec and mmap2 events
but at that point it's far too late.

I surmise from what I see with perf top right now that this happens
a lot.