lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 3 Mar 2021 14:53:00 -0500 From: "Liang, Kan" <kan.liang@...ux.intel.com> To: Peter Zijlstra <peterz@...radead.org>, Vince Weaver <vincent.weaver@...ne.edu> Cc: mingo@...hat.com, linux-kernel@...r.kernel.org, acme@...nel.org, mark.rutland@....com, alexander.shishkin@...ux.intel.com, jolsa@...hat.com, namhyung@...nel.org, eranian@...gle.com, ak@...ux.intel.com, stable@...r.kernel.org Subject: Re: [PATCH] Revert "perf/x86: Allow zero PEBS status with only single active event" On 3/3/2021 1:59 PM, Peter Zijlstra wrote: > On Wed, Mar 03, 2021 at 05:42:18AM -0800, kan.liang@...ux.intel.com wrote: > >> For some old CPUs (HSW and earlier), the PEBS status in a PEBS record >> may be mistakenly set to 0. To minimize the impact of the defect, the >> commit was introduced to try to avoid dropping the PEBS record for some >> cases. It adds a check in the intel_pmu_drain_pebs_nhm(), and updates >> the local pebs_status accordingly. However, it doesn't correct the PEBS >> status in the PEBS record, which may trigger the crash, especially for >> the large PEBS. >> >> It's possible that all the PEBS records in a large PEBS have the PEBS >> status 0. If so, the first get_next_pebs_record_by_bit() in the >> __intel_pmu_pebs_event() returns NULL. The at = NULL. Since it's a large >> PEBS, the 'count' parameter must > 1. The second >> get_next_pebs_record_by_bit() will crash. >> >> Two solutions were considered to fix the crash. >> - Keep the SW workaround and add extra checks in the >> get_next_pebs_record_by_bit() to workaround the issue. The >> get_next_pebs_record_by_bit() is a critical path. The extra checks >> will bring extra overhead for the latest CPUs which don't have the >> defect. Also, the defect can only be observed on some old CPUs >> (For example, the issue can be reproduced on an HSW client, but I >> didn't observe the issue on my Haswell server machine.). The impact >> of the defect should be limit. >> This solution is dropped. >> - Drop the SW workaround and revert the commit. >> It seems that the commit never works, because the PEBS status in the >> PEBS record never be changed. The get_next_pebs_record_by_bit() only >> checks the PEBS status in the PEBS record. The record is dropped >> eventually. Reverting the commit should not change the current >> behavior. > >> +++ b/arch/x86/events/intel/ds.c >> @@ -2000,18 +2000,6 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d >> continue; >> } >> >> - /* >> - * On some CPUs the PEBS status can be zero when PEBS is >> - * racing with clearing of GLOBAL_STATUS. >> - * >> - * Normally we would drop that record, but in the >> - * case when there is only a single active PEBS event >> - * we can assume it's for that event. >> - */ >> - if (!pebs_status && cpuc->pebs_enabled && >> - !(cpuc->pebs_enabled & (cpuc->pebs_enabled-1))) >> - pebs_status = cpuc->pebs_enabled; > > Wouldn't something like: > > pebs_status = p->status = cpus->pebs_enabled; > I didn't consider it as a potential solution in this patch because I don't think it's a proper way that SW modifies the buffer, which is supposed to be manipulated by the HW. It's just a personal preference. I don't see any issue here. We may try it. Vince, could you please help check whether Peter's suggestion fixes the crash? Thanks, Kan > actually fix things without adding overhead? >
Powered by blists - more mailing lists