linux-kernel - Re: [PATCH 06/19] coresight: trbe: Refactor syndrome decoding

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251209155722.GX724103@e132581.arm.com>
Date: Tue, 9 Dec 2025 15:57:22 +0000
From: Leo Yan <leo.yan@....com>
To: Anshuman Khandual <anshuman.khandual@....com>
Cc: Suzuki K Poulose <suzuki.poulose@....com>,
	Mike Leach <mike.leach@...aro.org>,
	James Clark <james.clark@...aro.org>,
	Yeoreum Yun <yeoreum.yun@....com>, Will Deacon <will@...nel.org>,
	Mark Rutland <mark.rutland@....com>,
	Tamas Petz <tamas.petz@....com>,
	Tamas Zsoldos <tamas.zsoldos@....com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Namhyung Kim <namhyung@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
	Ian Rogers <irogers@...gle.com>,
	Adrian Hunter <adrian.hunter@...el.com>, coresight@...ts.linaro.org,
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
	linux-perf-users@...r.kernel.org
Subject: Re: [PATCH 06/19] coresight: trbe: Refactor syndrome decoding

On Fri, Dec 05, 2025 at 09:40:33AM +0530, Anshuman Khandual wrote:
> On 01/12/25 4:51 PM, Leo Yan wrote:
> > It gives priority to TRBSR_EL1.EA (external abort); an external abort
> > will immediately bail out and return an error.
> 
> This is an abrupt starting for a commit message without giving context.
> The rationale for the change needs to be explained rather than details
> of implementation.

Agreed.  I will refine the commit log to state that we need to diagnose
the error code based on the hierarchy information (EC and BSC).

> > Next, the syndrome decoding is refactored based on two levels of
> > information: the EC (Event Class) bits and the BSC (Trace Buffer Status
> > Code) bits.
> > 
> > If TRBSR_EL1.EC==0b000000, the driver continues parsing TRBSR_EL1.BSC to
> > identify the specific trace buffer event.  Otherwise, any non-zero
> > TRBSR_EL1.EC is treated as an error.
> > 
> > For error cases, the driver prints an error string and dumps registers
> > for debugging.
> > 
> > No additional checks are required for wrap mode beyond verifying the
> > TRBSR_EL1.WRAP bit, even on units with overwrite errata, as this bit
> > reliably indicates a buffer wrap.
> 
> Should the errata related changes be done in a separate patch instead ?

Good point.  I will consider extracting the change; this would make
the review clearer.

[...]

> >  static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle,
> >  						 u64 trbsr)
> >  {
> > +	const char *err_str;
> >  	int ec = get_trbe_ec(trbsr);
> >  	int bsc = get_trbe_bsc(trbsr);
> > -	struct trbe_buf *buf = etm_perf_sink_config(handle);
> > -	struct trbe_cpudata *cpudata = buf->cpudata;
> >  
> >  	WARN_ON(is_trbe_running(trbsr));
> > -	if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
> > -		return TRBE_FAULT_ACT_FATAL;
> >  
> > -	if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
> > -		return TRBE_FAULT_ACT_FATAL;
> > +	if (is_trbe_abort(trbsr)) {
> > +		err_str = "External abort";
> > +		goto out_fatal;
> > +	}
> >  
> > -	/*
> > -	 * If the trbe is affected by TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
> > -	 * it might write data after a WRAP event in the fill mode.
> > -	 * Thus the check TRBPTR == TRBBASER will not be honored.
> > -	 */
> > -	if ((is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) &&
> > -	    (trbe_may_overwrite_in_fill_mode(cpudata) ||
> > -	     get_trbe_write_pointer() == get_trbe_base_pointer()))
> > +	switch (ec) {
> > +	case TRBE_EC_OTHERS:
> > +		break;
> 
> No message for this ?

No.  EC_OTHERS means "this is a normal maintenance interrupt".

> > +	case TRBE_EC_BUF_MGMT_IMPL:
> > +		err_str = "Unexpected implemented management";
> > +		goto out_fatal;
> > +	case TRBE_EC_GP_CHECK_FAULT:
> > +		err_str = "Granule Protection Check fault";
> > +		goto out_fatal;
> > +	case TRBE_EC_STAGE1_ABORT:
> > +		err_str = "Stage 1 data abort";
> > +		goto out_fatal;
> > +	case TRBE_EC_STAGE2_ABORT:
> > +		err_str = "Stage 2 data abort";
> > +		goto out_fatal;
> > +	default:
> > +		err_str = "Unknown error code";
> > +		goto out_fatal;
> > +	}
> > +
> > +	switch (bsc) {
> > +	case TRBE_BSC_NOT_STOPPED:
> > +		break;
> > +	case TRBE_BSC_FILLED:
> > +		break;
> > +	case TRBE_BSC_TRIGGERED:
> > +		err_str = "Unexpected trigger status";
> > +		goto out_fatal;
> > +	default:
> > +		err_str = "Unexpected buffer status code";
> > +		goto out_fatal;
> > +	}
> 
> Just wondering if it would be cleaner to add a const char * based
> static array mapping these EC/BSC codes with above error messages.

Now we have two level's (EC/BSC) error codes, it would be complex to
maintain a string array and map to two level's error codes.

> > +	if (is_trbe_wrap(trbsr))
> 
> But what about TRBE affected with trbe_may_overwrite_in_fill_mode()
> is that still being taken care of some how ?

The TRBSR_EL1.WRAP bit reliably indicates that the pointer has
wrapped.  I don't think we need any special handling for the overwrite
erratum here; we can simply return TRBE_FAULT_ACT_WRAP.

Afterward, trbe_get_trace_size() consumes the wrap flag and will
compute the trace size appropriately for the overwrite erratum when
the wrap flag is set.

[...]

> > -#define TRBE_EC_OTHERS		0
> > -#define TRBE_EC_STAGE1_ABORT	36
> > -#define TRBE_EC_STAGE2_ABORT	37
> > +#define TRBE_EC_OTHERS			0x0
> > +#define TRBE_EC_GP_CHECK_FAULT		0X1e
> > +#define TRBE_EC_BUF_MGMT_IMPL		0x1f
> > +#define TRBE_EC_STAGE1_ABORT		0x24
> > +#define TRBE_EC_STAGE2_ABORT		0x25
> 
> Please do mention the document source for these new EC codes in the
> commit message.

As James suggested in another reply, I will consider to add sysreg
enum.

Thanks,
Leo