linux-kernel - Re: [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230406131351.GL386572@hirez.programming.kicks-ass.net>
Date:   Thu, 6 Apr 2023 15:13:51 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     kan.liang@...ux.intel.com
Cc:     mingo@...hat.com, linux-kernel@...r.kernel.org, ak@...ux.intel.com,
        eranian@...gle.com
Subject: Re: [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record

On Tue, Mar 28, 2023 at 03:27:35PM -0700, kan.liang@...ux.intel.com wrote:
> From: Kan Liang <kan.liang@...ux.intel.com>
> 
> The kernel warning for the unexpected PEBS record can also be observed
> during a context switch, when the below commands are running in parallel
> for a while on SPR.
> 
>   while true; do perf record --no-buildid -a --intr-regs=AX -e
>   cpu/event=0xd0,umask=0x81/pp -c 10003 -o /dev/null ./triad; done &
> 
>   while true; do perf record -o /tmp/out -W -d -e
>   '{ld_blocks.store_forward:period=1000000,
>   MEM_TRANS_RETIRED.LOAD_LATENCY:u:precise=2:ldlat=4}'
>   -c 1037 ./triad; done
>   *The triad program is just the generation of loads/stores.
> 
> The current PEBS code assumes that all the PEBS records in the DS buffer
> have the same size, aka cpuc->pebs_record_size. It's true for the most
> cases, since the DS buffer is always flushed in every context switch.
> 
> However, there is a corner case that breaks the assumption.
> A system-wide PEBS event with the large PEBS config may be enabled
> during a context switch. Some PEBS records for the system-wide PEBS may
> be generated while the old task is sched out but the new one hasn't been
> sched in yet. When the new task is sched in, the cpuc->pebs_record_size
> may be updated for the per-task PEBS events. So the existing system-wide
> PEBS records have a different size from the later PEBS records.
> 
> Two methods were considered to fix the issue.
> One is to flush the DS buffer for the system-wide PEBS right before the
> new task sched in. It has to be done in the generic code via the
> sched_task() call back. However, the sched_task() is shared among
> different ARCHs. The movement may impact other ARCHs, e.g., AMD BRS
> requires the sched_task() is called after the PMU has started on a
> ctxswin. The method is dropped.
> 
> The other method is implemented here. It doesn't assume that all the
> PEBS records have the same size any more. The size from each PEBS record
> is used to parse the record. For the previous platform (PEBS format < 4),
> which doesn't support adaptive PEBS, there is nothing changed.

Same as with the other; why can't we flush the buffer when we reprogram
the hardware?