linux-kernel - Re: [PATCH v6 08/11] platform/x86/intel/ifs: Add scan test support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87r156hjpl.ffs@tglx>
Date:   Fri, 06 May 2022 15:30:30 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Tony Luck <tony.luck@...el.com>, hdegoede@...hat.com,
        markgross@...nel.org
Cc:     mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
        x86@...nel.org, hpa@...or.com, corbet@....net,
        gregkh@...uxfoundation.org, andriy.shevchenko@...ux.intel.com,
        jithu.joseph@...el.com, ashok.raj@...el.com, tony.luck@...el.com,
        rostedt@...dmis.org, dan.j.williams@...el.com,
        linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
        platform-driver-x86@...r.kernel.org, patches@...ts.linux.dev,
        ravi.v.shankar@...el.com
Subject: Re: [PATCH v6 08/11] platform/x86/intel/ifs: Add scan test support

On Thu, May 05 2022 at 18:40, Tony Luck wrote:
> +/*
> + * Note all code and data in this file is protected by
> + * ifs_sem. On HT systems all threads on a core will
> + * execute together, but only the first thread on the
> + * core will update results of the test.
> + */
> +struct workqueue_struct *ifs_wq;

Seems to be unused.

> +static bool oscan_enabled = true;

What changes this?

> +static void message_not_tested(struct device *dev, int cpu, union ifs_status status)
> +{
> +	if (status.error_code < ARRAY_SIZE(scan_test_status))

Please add curly brackets as these are not one-line statements.

> +		dev_info(dev, "CPU(s) %*pbl: SCAN operation did not start. %s\n",
> +			 cpumask_pr_args(topology_sibling_cpumask(cpu)),
> +			 scan_test_status[status.error_code]);
> +/*
> + * Execute the scan. Called "simultaneously" on all threads of a core
> + * at high priority using the stop_cpus mechanism.
> + */
> +static int doscan(void *data)
> +{
> +	int cpu = smp_processor_id();
> +	u64 *msrs = data;
> +	int first;
> +
> +	/* Only the first logical CPU on a core reports result */
> +	first = cpumask_first(topology_sibling_cpumask(cpu));

Shouldn't that be cpu_smt_mask()?

> +	/*
> +	 * This WRMSR will wait for other HT threads to also write
> +	 * to this MSR (at most for activate.delay cycles). Then it
> +	 * starts scan of each requested chunk. The core scan happens
> +	 * during the "execution" of the WRMSR. This instruction can
> +	 * take up to 200 milliseconds before it retires.

200ms per test chunk?

> +	 */
> +	wrmsrl(MSR_ACTIVATE_SCAN, msrs[0]);
> +

> +	while (activate.start <= activate.stop) {
> +		if (time_after(jiffies, timeout)) {
> +			status.error_code = IFS_SW_TIMEOUT;
> +			break;
> +		}
> +
> +		msrvals[0] = activate.data;
> +		stop_core_cpuslocked(cpu, doscan, msrvals);
> +
> +		status.data = msrvals[1];
> +
> +		/* Some cases can be retried, give up for others */
> +		if (!can_restart(status))
> +			break;
> +
> +		if (status.chunk_num == activate.start) {
> +			/* Check for forward progress */
> +			if (retries-- == 0) {
> +				if (status.error_code == IFS_NO_ERROR)
> +					status.error_code = IFS_SW_PARTIAL_COMPLETION;
> +				break;
> +			}
> +		} else {
> +			retries = MAX_IFS_RETRIES;
> +			activate.start = status.chunk_num;
> +		}
> +	}

Looks way better now.

> +}
> +/*
> + * Initiate per core test. It wakes up work queue threads on the target cpu and
> + * its sibling cpu. Once all sibling threads wake up, the scan test gets executed and
> + * wait for all sibling threads to finish the scan test.
> + */
> +int do_core_test(int cpu, struct device *dev)
> +{
> +	int ret = 0;
> +
> +	if (!scan_enabled)
> +		return -ENXIO;
> +
> +	/* Prevent CPUs from being taken offline during the scan test */
> +	cpus_read_lock();
> +
> +	if (!cpu_online(cpu)) {
> +		dev_info(dev, "cannot test on the offline cpu %d\n", cpu);
> +		ret = -EINVAL;
> +		goto out;
> +	}

Coming back to my points from the previous round:

1) How is that supposed to work on a system which has HT enabled in BIOS,
   but disabled on the kernel command line or via /sys/..../smt/control or
   when a HT sibling is offlined temporarily?

   I assume it cannot work, but I can't see anything which handles those
   cases.

2) That documentation for the admin/user got eaten by the gremlins in
   the intertubes again.

Thanks,

        tglx