linux-kernel - Re: assert from intel_pmu_hw

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <b121de4a-1f75-b84c-3637-cc676b4f5406@linux.intel.com>
Date:   Mon, 17 Oct 2022 10:26:54 -0400
From:   "Liang, Kan" <kan.liang@...ux.intel.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        "Liang, Kan" <kan.liang@...el.com>
Cc:     linux-kernel@...r.kernel.org,
        Ravi Bangoria <ravi.bangoria@....com>,
        Mark Rutland <mark.rutland@....com>
Subject: Re: assert from intel_pmu_hw_config



On 2022-10-14 6:15 p.m., Peter Zijlstra wrote:
> Hi Kan,
> 
> While fuzzing on my ADL, I saw a splat (sadly not captured because
> MeshCommander is a pain in the backside).
> 
> The thing I did recover was that it was the new
> lockdep_assert_event_ctx() triggering from intel_pmu_hw_config().
> 
> Now; that code reads:
> 
> 	if (require_mem_loads_aux_event(event) &&
> 	    (event->attr.sample_type & PERF_SAMPLE_DATA_SRC) &&
> 	    is_mem_loads_event(event)) {
> 		struct perf_event *leader = event->group_leader;
> 		struct perf_event *sibling = NULL;
> 
> 		if (!is_mem_loads_aux_event(leader)) {
> 			for_each_sibling_event(sibling, leader) {
> 				if (is_mem_loads_aux_event(sibling))
> 					break;
> 			}
> 			if (list_entry_is_head(sibling, &leader->sibling_list, sibling_list))
> 				return -ENODATA;
> 		}
> 	}
> 
> And it is trying to assert leader->ctx->mutex is held.
> 
> Now, the calling context perf_try_init_event() has:
> 
> 	/*
> 	 * A number of pmu->event_init() methods iterate the sibling_list to,
> 	 * for example, validate if the group fits on the PMU. Therefore,
> 	 * if this is a sibling event, acquire the ctx->mutex to protect
> 	 * the sibling_list.
> 	 */
> 	if (event->group_leader != event && pmu->task_ctx_nr != perf_sw_context) {
> 		/*
> 		 * This ctx->mutex can nest when we're called through
> 		 * inheritance. See the perf_event_ctx_lock_nested() comment.
> 		 */
> 		ctx = perf_event_ctx_lock_nested(event->group_leader,
> 						 SINGLE_DEPTH_NESTING);
> 		BUG_ON(!ctx);
> 	}
> 
> IOW; we only hold leader->ctx->mutex when event is *NOT* the group
> leader; while the above code *can* in fact use for_each_sibilng_event()
> on the group leader when conditions are just right.
> 
> Now, it's really late and my brain has long since started the weekend,
> but I think something like the below ought to fix things.
> 
> Does that make sense? IIRC this would not destroy the purpose of this
> code -- although admittedly, the comment there tickles only vague
> memories.
> 
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index d8af75466ee9..450463d36450 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3975,7 +3975,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
>  		struct perf_event *leader = event->group_leader;
>  		struct perf_event *sibling = NULL;
>  
> -		if (!is_mem_loads_aux_event(leader)) {
> +		if (event != leader && !is_mem_loads_aux_event(leader)) {
>  			for_each_sibling_event(sibling, leader) {
>  				if (is_mem_loads_aux_event(sibling))
>  					break;

If the leader is the load latency event, I think we can error out
directly. Because the auxiliary event never can be in front of the load
latency event. Does the below patch help?

		struct perf_event *leader = event->group_leader;
		struct perf_event *sibling = NULL;

+		if (event == leader)
+			return -ENODATA;
+
		if (!is_mem_loads_aux_event(leader)) {
			for_each_sibling_event(sibling, leader) {
				if (is_mem_loads_aux_event(sibling))
					break;
			}

Thanks,
Kan