[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <32888c33-c286-c600-66cb-8b1b03beeb8b@linux.intel.com>
Date: Mon, 1 Mar 2021 08:20:48 -0500
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>,
Vince Weaver <vincent.weaver@...ne.edu>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Namhyung Kim <namhyung@...nel.org>,
Stephane Eranian <eranian@...gle.com>
Subject: Re: [perf] perf_fuzzer causes crash in intel_pmu_drain_pebs_nhm()
On 2/11/2021 9:53 AM, Peter Zijlstra wrote:
>
> Kan, do you have time to look at this?
>
> On Thu, Jan 28, 2021 at 02:49:47PM -0500, Vince Weaver wrote:
>> On Thu, 28 Jan 2021, Vince Weaver wrote:
>>
>>> the perf_fuzzer has turned up a repeatable crash on my haswell system.
>>>
>>> addr2line is not being very helpful, it points to DECLARE_PER_CPU_FIRST.
>>> I'll investigate more when I have the chance.
>>
>> so I poked around some more.
>>
>> This seems to be caused in
>>
>> __intel_pmu_pebs_event()
>> get_next_pebs_record_by_bit() ds.c line 1639
>> get_pebs_status(at) ds.c line 1317
>> return ((struct pebs_record_nhm *)n)->status;
>>
>> where "n" has the value of 0xc0 rather than a proper pointer.
>>
I think I find the suspicious patch.
The commt id 01330d7288e00 ("perf/x86: Allow zero PEBS status with only
single active event")
https://lore.kernel.org/lkml/tip-01330d7288e0050c5aaabc558059ff91589e67cd@git.kernel.org/
The patch is an SW workaround for some old CPUs (HSW and earlier), which
may set 0 to the PEBS status. It adds a check in the
intel_pmu_drain_pebs_nhm(). It tries to minimize the impact of the
defect by avoiding dropping the PEBS records which have PEBS status 0.
But, it doesn't correct the PEBS status, which may bring problems,
especially for the large PEBS.
It's possible that all the PEBS records in a large PEBS have the PEBS
status 0. If so, the first get_next_pebs_record_by_bit() in the
__intel_pmu_pebs_event() returns NULL. The at = NULL. Since it's a large
PEBS, the 'count' parameter must > 1. The second
get_next_pebs_record_by_bit() will crash.
Could you please revert the patch and check whether it fixes your issue?
Thanks,
Kan
Powered by blists - more mailing lists