lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6d4a6e3-eba9-a5a2-fc41-393d2b5d755@os.amperecomputing.com>
Date:   Mon, 23 Jan 2023 15:11:12 -0800 (PST)
From:   Ilkka Koskinen <ilkka@...amperecomputing.com>
To:     Robin Murphy <robin.murphy@....com>
cc:     will@...nel.org, mark.rutland@....com,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        ilkka@...amperecomputing.com
Subject: Re: [PATCH] Partially revert "perf/arm-cmn: Optimise DTC counter
 accesses"

Hi Robin,

On Mon, 23 Jan 2023, Robin Murphy wrote:
> It turns out the optimisation implemented by commit 4f2c3872dde5 is
> totally broken, since all the places that consume hw->dtcs_used for
> events other than cycle count are still not expecting it to be sparsely
> populated, and fail to read all the relevant DTC counters correctly if
> so.
>
> If implemented correctly, the optimisation potentially saves up to 3
> register reads per event update, which is reasonably significant for
> events targeting a single node, but still not worth a massive amount of
> additional code complexity overall. Getting it right within the current
> design looks a fair bit more involved than it was ever intended to be,
> so let's just make a functional revert which restores the old behaviour
> while still backporting easily.
>
> Fixes: 4f2c3872dde5 ("perf/arm-cmn: Optimise DTC counter accesses")
> Reported-by: Ilkka Koskinen <ilkka@...amperecomputing.com>
> Signed-off-by: Robin Murphy <robin.murphy@....com>

Thanks for the patch. It looks good to me and seems to work fine.

Cheers, Ilkka

> ---
> drivers/perf/arm-cmn.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index b80a9b74662b..1deb61b22bc7 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -1576,7 +1576,6 @@ static int arm_cmn_event_init(struct perf_event *event)
> 			hw->dn++;
> 			continue;
> 		}
> -		hw->dtcs_used |= arm_cmn_node_to_xp(cmn, dn)->dtc;
> 		hw->num_dns++;
> 		if (bynodeid)
> 			break;
> @@ -1589,6 +1588,12 @@ static int arm_cmn_event_init(struct perf_event *event)
> 			nodeid, nid.x, nid.y, nid.port, nid.dev, type);
> 		return -EINVAL;
> 	}
> +	/*
> +	 * Keep assuming non-cycles events count in all DTC domains; turns out
> +	 * it's hard to make a worthwhile optimisation around this, short of
> +	 * going all-in with domain-local counter allocation as well.
> +	 */
> +	hw->dtcs_used = (1U << cmn->num_dtcs) - 1;
>
> 	return arm_cmn_validate_group(cmn, event);
> }
> -- 
> 2.36.1.dirty
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ