[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87r156hjpl.ffs@tglx>
Date: Fri, 06 May 2022 15:30:30 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Tony Luck <tony.luck@...el.com>, hdegoede@...hat.com,
markgross@...nel.org
Cc: mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
x86@...nel.org, hpa@...or.com, corbet@....net,
gregkh@...uxfoundation.org, andriy.shevchenko@...ux.intel.com,
jithu.joseph@...el.com, ashok.raj@...el.com, tony.luck@...el.com,
rostedt@...dmis.org, dan.j.williams@...el.com,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
platform-driver-x86@...r.kernel.org, patches@...ts.linux.dev,
ravi.v.shankar@...el.com
Subject: Re: [PATCH v6 08/11] platform/x86/intel/ifs: Add scan test support
On Thu, May 05 2022 at 18:40, Tony Luck wrote:
> +/*
> + * Note all code and data in this file is protected by
> + * ifs_sem. On HT systems all threads on a core will
> + * execute together, but only the first thread on the
> + * core will update results of the test.
> + */
> +struct workqueue_struct *ifs_wq;
Seems to be unused.
> +static bool oscan_enabled = true;
What changes this?
> +static void message_not_tested(struct device *dev, int cpu, union ifs_status status)
> +{
> + if (status.error_code < ARRAY_SIZE(scan_test_status))
Please add curly brackets as these are not one-line statements.
> + dev_info(dev, "CPU(s) %*pbl: SCAN operation did not start. %s\n",
> + cpumask_pr_args(topology_sibling_cpumask(cpu)),
> + scan_test_status[status.error_code]);
> +/*
> + * Execute the scan. Called "simultaneously" on all threads of a core
> + * at high priority using the stop_cpus mechanism.
> + */
> +static int doscan(void *data)
> +{
> + int cpu = smp_processor_id();
> + u64 *msrs = data;
> + int first;
> +
> + /* Only the first logical CPU on a core reports result */
> + first = cpumask_first(topology_sibling_cpumask(cpu));
Shouldn't that be cpu_smt_mask()?
> + /*
> + * This WRMSR will wait for other HT threads to also write
> + * to this MSR (at most for activate.delay cycles). Then it
> + * starts scan of each requested chunk. The core scan happens
> + * during the "execution" of the WRMSR. This instruction can
> + * take up to 200 milliseconds before it retires.
200ms per test chunk?
> + */
> + wrmsrl(MSR_ACTIVATE_SCAN, msrs[0]);
> +
> + while (activate.start <= activate.stop) {
> + if (time_after(jiffies, timeout)) {
> + status.error_code = IFS_SW_TIMEOUT;
> + break;
> + }
> +
> + msrvals[0] = activate.data;
> + stop_core_cpuslocked(cpu, doscan, msrvals);
> +
> + status.data = msrvals[1];
> +
> + /* Some cases can be retried, give up for others */
> + if (!can_restart(status))
> + break;
> +
> + if (status.chunk_num == activate.start) {
> + /* Check for forward progress */
> + if (retries-- == 0) {
> + if (status.error_code == IFS_NO_ERROR)
> + status.error_code = IFS_SW_PARTIAL_COMPLETION;
> + break;
> + }
> + } else {
> + retries = MAX_IFS_RETRIES;
> + activate.start = status.chunk_num;
> + }
> + }
Looks way better now.
> +}
> +/*
> + * Initiate per core test. It wakes up work queue threads on the target cpu and
> + * its sibling cpu. Once all sibling threads wake up, the scan test gets executed and
> + * wait for all sibling threads to finish the scan test.
> + */
> +int do_core_test(int cpu, struct device *dev)
> +{
> + int ret = 0;
> +
> + if (!scan_enabled)
> + return -ENXIO;
> +
> + /* Prevent CPUs from being taken offline during the scan test */
> + cpus_read_lock();
> +
> + if (!cpu_online(cpu)) {
> + dev_info(dev, "cannot test on the offline cpu %d\n", cpu);
> + ret = -EINVAL;
> + goto out;
> + }
Coming back to my points from the previous round:
1) How is that supposed to work on a system which has HT enabled in BIOS,
but disabled on the kernel command line or via /sys/..../smt/control or
when a HT sibling is offlined temporarily?
I assume it cannot work, but I can't see anything which handles those
cases.
2) That documentation for the admin/user got eaten by the gremlins in
the intertubes again.
Thanks,
tglx
Powered by blists - more mailing lists