linux-kernel - Re: [PATCH] perf_events: fix transaction recovery in group_sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTimGyMRHEDxBwgCEyQrfjSrp9tFj1J-xHpY0=f1u@mail.gmail.com>
Date:	Fri, 15 Oct 2010 19:34:01 +0200
From:	Stephane Eranian <eranian@...gle.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	linux-kernel@...r.kernel.org, mingo@...e.hu, paulus@...ba.org,
	davem@...emloft.net, fweisbec@...il.com,
	perfmon2-devel@...ts.sf.net, eranian@...il.com,
	robert.richter@....com
Subject: Re: [PATCH] perf_events: fix transaction recovery in group_sched_in()

On Fri, Oct 15, 2010 at 7:29 PM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Fri, 2010-10-15 at 16:54 +0200, Stephane Eranian wrote:
>> The group_sched_in() function uses a transactional approach to schedule
>> a group of events. In a group, either all events can be scheduled or
>> none are. To schedule each event in, the function calls event_sched_in().
>> In case of error, event_sched_out() is called on each event in the group.
>>
>> The problem is that event_sched_out() does not completely cancel the
>> effects of event_sched_in(). Furthermore event_sched_out() changes the
>> state of the event as if it had run which is not true is this particular
>> case.
>>
>> Those inconsistencies impact time tracking fields and may lead to events
>> in a group not all reporting the same time_enabled and time_running values.
>> This is demonstrated with the example below:
>>
>> $ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
>> 1946101 unhalted_core_cycles (32.85% scaling, ena=829181, run=556827)
>>   11423 baclears (32.85% scaling, ena=829181, run=556827)
>>    7671 baclears (0.00% scaling, ena=556827, run=556827)
>>
>> 2250443 unhalted_core_cycles (57.83% scaling, ena=962822, run=405995)
>>   11705 baclears (57.83% scaling, ena=962822, run=405995)
>>   11705 baclears (57.83% scaling, ena=962822, run=405995)
>>
>> Notice that in the first group, the last baclears event does not
>> report the same timings as its siblings.
>>
>> This issue comes from the fact that tstamp_stopped is updated
>> by event_sched_out() as if the event had actually run.
>>
>> To solve the issue, we must ensure that, in case of error, there is
>> no change in the event state whatsoever. That means timings must
>> remain as they were when entering group_sched_in().
>>
>> To do this we defer updating tstamp_running until we know the
>> transaction succeeded. Therefore, we have split event_sched_in()
>> in two parts separating the update to tstamp_running.
>>
>> Similarly, in case of error, we do not want to update tstamp_stopped.
>> Therefore, we have split event_sched_out() in two parts separating
>> the update to tstamp_stopped.
>>
>> With this patch, we now get the following output:
>>
>> $ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
>> 2492050 unhalted_core_cycles (71.75% scaling, ena=1093330, run=308841)
>>   11243 baclears (71.75% scaling, ena=1093330, run=308841)
>>   11243 baclears (71.75% scaling, ena=1093330, run=308841)
>>
>> 1852746 unhalted_core_cycles (0.00% scaling, ena=784489, run=784489)
>>    9253 baclears (0.00% scaling, ena=784489, run=784489)
>>    9253 baclears (0.00% scaling, ena=784489, run=784489)
>>
>> Note that the uneven timing between groups is a side effect of
>> the process spending most of its time sleeping, i.e., not enough
>> event rotations (but that's a separate issue).
>>
>> Signed-off-by: Stephane Eranian <eranian@...gle.com>
>
> Yes, makes sense.. I'm a bit hesitant to slap a -stable tag on it due to
> its size,.. Ingo, Paulus?
>
I was worried about the size too but I could not figure out another
smaller way of doing this. An alternative would be to pass an
extra argument to the function and call it twice.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/