lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZdTBtt0oR6Q1RcAB@shell.armlinux.org.uk>
Date: Tue, 20 Feb 2024 15:13:58 +0000
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: linux-pm@...r.kernel.org, loongarch@...ts.linux.dev,
	linux-acpi@...r.kernel.org, linux-arch@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
	linux-riscv@...ts.infradead.org, kvmarm@...ts.linux.dev,
	x86@...nel.org, acpica-devel@...ts.linuxfoundation.org,
	linux-csky@...r.kernel.org, linux-doc@...r.kernel.org,
	linux-ia64@...r.kernel.org, linux-parisc@...r.kernel.org,
	Salil Mehta <salil.mehta@...wei.com>,
	Jean-Philippe Brucker <jean-philippe@...aro.org>,
	jianyong.wu@....com, justin.he@....com,
	James Morse <james.morse@....com>,
	Jonathan Cameron <Jonathan.Cameron@...wei.com>
Subject: Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from
 acpi_processor_get_info()

On Tue, Feb 20, 2024 at 11:27:15AM +0000, Russell King (Oracle) wrote:
> On Thu, Feb 15, 2024 at 08:22:29PM +0100, Rafael J. Wysocki wrote:
> > On Wed, Jan 31, 2024 at 5:50 PM Russell King <rmk+kernel@...linux.org.uk> wrote:
> > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > index cf7c1cca69dd..a68c475cdea5 100644
> > > --- a/drivers/acpi/acpi_processor.c
> > > +++ b/drivers/acpi/acpi_processor.c
> > > @@ -314,6 +314,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> > >                         cpufreq_add_device("acpi-cpufreq");
> > >         }
> > >
> > > +       /*
> > > +        * Register CPUs that are present. get_cpu_device() is used to skip
> > > +        * duplicate CPU descriptions from firmware.
> > > +        */
> > > +       if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> > > +           !get_cpu_device(pr->id)) {
> > > +               int ret = arch_register_cpu(pr->id);
> > > +
> > > +               if (ret)
> > > +                       return ret;
> > > +       }
> > > +
> > >         /*
> > >          *  Extra Processor objects may be enumerated on MP systems with
> > >          *  less than the max # of CPUs. They should be ignored _iff
> > 
> > This is interesting, because right below there is the following code:
> > 
> >     if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> >         int ret = acpi_processor_hotadd_init(pr);
> > 
> >         if (ret)
> >             return ret;
> >     }
> > 
> > and acpi_processor_hotadd_init() essentially calls arch_register_cpu()
> > with some extra things around it (more about that below).
> > 
> > I do realize that acpi_processor_hotadd_init() is defined under
> > CONFIG_ACPI_HOTPLUG_CPU, so for the sake of the argument let's
> > consider an architecture where CONFIG_ACPI_HOTPLUG_CPU is set.
> > 
> > So why are the two conditionals that almost contradict each other both
> > needed?  It looks like the new code could be combined with
> > acpi_processor_hotadd_init() to do the right thing in all cases.
> > 
> > Now, acpi_processor_hotadd_init() does some extra things that look
> > like they should be done by the new code too.
> > 
> > 1. It checks invalid_phys_cpuid() which appears to be a good idea to me.
> > 
> > 2. It uses locking around arch_register_cpu() which doesn't seem
> > unreasonable either.
> > 
> > 3. It calls acpi_map_cpu() and I'm not sure why this is not done by
> > the new code.
> > 
> > The only thing that can be dropped from it is the _STA check AFAICS,
> > because acpi_processor_add() won't even be called if the CPU is not
> > present (and not enabled after the first patch).
> > 
> > So why does the code not do 1 - 3 above?
> 
> Honestly, I'm out of my depth with this and can't answer your
> questions - and I really don't want to try fiddling with this code
> because it's just too icky (even in its current form in mainline)
> to be understandable to anyone who hasn't gained a detailed knowledge
> of this code.
> 
> It's going to require a lot of analysis - how acpi_map_cpuid() behaves
> in all circumstances, what this means for invalid_logical_cpuid() and
> invalid_phys_cpuid(), what paths will be taken in each case. This code
> is already just too hairy for someone who isn't an experienced ACPI
> hacker to be able to follow and I don't see an obvious way to make it
> more readable.
> 
> James' additions make it even more complex and less readable.

As an illustration of the problems I'm having here, I was just writing
a reply to this with a suggestion of transforming this code ultimately
to:

	if (!get_cpu_device(pr->id)) {
		int ret;

		if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id))
			ret = acpi_processor_make_enabled(pr);
		else
			ret = acpi_processor_make_present(pr);

		if (ret)
			return ret;
	}

(acpi_processor_make_present() would be acpi_processor_hotadd_init()
and acpi_processor_make_enabled() would be arch_register_cpu() at this
point.)

Then I realised that's a bad idea - because we really need to check
that pr->id is valid before calling get_cpu_device() on it, so this
won't work. That leaves us with:

	int ret;

	if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
		/* x86 et.al. path */
		ret = acpi_processor_make_present(pr);
	} else if (!get_cpu_device(pr->id)) {
		/* Arm64 path */
		ret = acpi_processor_make_enabled(pr);
	} else {
		ret = 0;
	}

	if (ret)
		return ret;

Now, the next transformation would be to move !get_cpu_device(pr->id)
into acpi_processor_make_enabled() which would eliminate one of those
if() legs.

Now, if we want to somehow make the call to arch_regster_cpu() common
in these two paths, the next question is what are the _precise_
semantics of acpi_map_cpu(), particularly with respect to it
modifying pr->id. Is it guaranteed to always give the same result
for the same processor described in ACPI? What acpi_map_cpu() anyway,
I can find no documentation for it.

Then there's the question whether calling acpi_unmap_cpu() should be
done on the failure path if arch_register_cpu() fails, which is done
for the x86 path but not the Arm64 path. Should it be done for the
Arm64 path? I've no idea, but as Arm64 doesn't implement either of
these two functions, I guess they could be stubbed out and thus be
no-ops - but then we open a hole where if pr->id is invalid, we
end up passing that invalid value to arch_register_cpu() which I'm
quite sure will explode with a negative CPU number.

So, to my mind, what you're effectively asking for is a total rewrite
of all the code in and called by acpi_processor_get_info()... and that
is not something I am willing to do (because it's too far outside of
my knowledge area.)

As I said in my reply to patch 1, I think your comments on patch 2
make Arm64 vcpu hotplug unachievable in a reasonable time frame, and
certainly outside the bounds of what I can do to progress this.

So, at this point I'm going to stand down from further participation
with this patch set as I believe I've reached the limit of what I can
do to progress it.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ