linux-kernel - Lock dependency based tree report in perf lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100129231723.GB5052@nowhere>
Date:	Sat, 30 Jan 2010 00:17:24 +0100
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Hitoshi Mitake <mitake@....info.waseda.ac.jp>,
	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <peterz@...radead.org>,
	Paul Mackerras <paulus@...ba.org>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	LKML <linux-kernel@...r.kernel.org>
Subject: Lock dependency based tree report in perf lock

Hi,

Looking at the report layout we have with perf lock, it gives
a cool overview of per lock waittime, acquire time, by avg/max,
that's cool.

Now to complete the overview through another dimension we could
have another mode on top of a lock dependency based tree:

-- 125 ms -- lock1
             |
             -- 100 ms -- lock2
             |              |
             |               -- 99 ms -- lock 3
             |              |
             |               -- 16 us -- lock 4
             |                            |
             |                            ----- [...]
             |
             --  25 ms -- lock 5

Having such view can tell you which lock is delaying another
one. We could have this view by average or maximum of acquired
time (waittime here would be pointless).

And we can also report the outer locking noise time for these
reports.
Taking the above example:

-- 100 ms/84 us -- lock2
                   |
                   -- 99 ms -- lock 3
                   |
                   -- 16 us -- lock 4
                               |
                               ----- [...]

100 ms is the time lock2 has been acquired.
Lock 3 has been acquired 99 ms and lock 4 during 16 us,
which means the outer locking noise (the code executed
outside children locks) is of 84 us.

This noise can give a good rough glance of the influence
childs locks can have on a parent. This is somehow the child
weight, this all can even be expressed using percentages,
we could even reproduce the absolute/relative percentage
we currently have with the callchains in perf report
(-g graph for absolute, -g fractal for relative).

Or we can have a ghost child for each locks that represent
the outer child locks noise:

-- parent percentage -- lock2
                        |
                        -- 99 %   -- lock 3
                        |
                        -- 0.84 % -- noise // could have its own color for quick distinguishing
                        |
                        -- 0.16 % -- lock 4
                                     |
                                     ----- [...]

May be just one tricky thing.
Say we have lock3 and lock4 that depend on lock2,
it doesn't always means lock4 will always be taken after
lock3, it doesn't even mean lock4 will ever be taken after
lock3.

So we shouldn't have one branch per dependency but one
branch per practical scenario encountered.

So imagine we have the following dependency:

lock2
 |
 --- lock3
 |
 --- lock4
 |
 --- lock5

But we can actually have only the following scenarios:

lock2
|
-- lock3
|
-- lock4

lock2
|
-- lock4

lock2
|
-- lock5

Then we need three branches:

-- lock2 percentage -- lock2
                        |
                        -- 60 %
                        |   |
                        |   -- 90 % -- lock 3
                        |   |
                        |    -- 9 %  -- noise
                        |   |
                        |    -- 1 %  -- lock 4
                        |
                        -- 20 %
                        |   |
                        |    -- 99 % -- lock4
                        |   |
                        |   --  1 %  -- noise
                        |
                        -- 20 %
                            |
                             -- 14 % -- lock5
                            |
                             -- 86 % -- noise

Hopefully we could have such expandable tree using
ncurses, but we can probably start like callchains in
perf report.

There might also be a problem with the accuracy.
The max time lock2 is acquired won't always match
the path where lock3 and lock4 had their max time too,
but this should give, most of the time, a view close to
the reality, especially once we enter non-sleepable lock
branches.

Anyway, that's just an idea, not trivial I must admit.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/