linux-kernel - Re: [PATCH RFC 1/1] arm64: Use PSCI calls for CPU stop when hotplug is supported

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <90ba929c-362b-a561-1099-5887fc5f6286@broadcom.com>
Date:   Wed, 23 Jan 2019 09:46:22 -0800
From:   Scott Branden <scott.branden@...adcom.com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     Pramod Kumar <pramod.kumar@...adcom.com>,
        Sudeep Holla <sudeep.holla@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will.deacon@....com>,
        Suzuki K Poulose <Suzuki.Poulose@....com>,
        Dave Martin <dave.martin@....com>,
        Rob Herring <robh@...nel.org>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Steve Capper <steve.capper@....com>,
        BCM Kernel Feedback <bcm-kernel-feedback-list@...adcom.com>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 1/1] arm64: Use PSCI calls for CPU stop when hotplug
 is supported

On 2019-01-23 9:33 a.m., Mark Rutland wrote:
> On Wed, Jan 23, 2019 at 09:05:26AM -0800, Scott Branden wrote:
>> Hi Mark,
>>
>> Hopefully I can shed some light on the use case inline.
>>
>> On 2019-01-23 8:48 a.m., Mark Rutland wrote:
>>> On Mon, Jan 21, 2019 at 11:30:02AM +0530, Pramod Kumar wrote:
>>>> On Mon, Jan 21, 2019 at 11:28 AM Pramod Kumar <pramod.kumar@...adcom.com>
>>>> wrote:
>>>>
>>>>       Need comes from a specific use case where one Accelerator card(SoC) is
>>>>       plugged in a sever over a PCIe interface.  This Card gets supply from a
>>>>       battery, which could provide very less power for a very small time, in case
>>>>       of any power loss. Once Card switches to battery, this has to reduce its
>>>>       power consumption to its lowest point and back-up the DDR contents asap
>>>>       before battery gets fully drained off.
>>> In this example is Linux running on the server, or on the accelerator?
>> Accelerator
>>> What precisely are you trying to back up from DDR, and why?
>> Data in DDR is being written to disk at this time (disk is connected to
>> accelerator)
>>> What is responsible for backing up that contents?
>> A low power M-class processor and DMA engine which continues necessary
>> operations to transfer DDR memory to disk.
>>
>> The high power processors on the accelerator running linux needed to be
>> halted ASAP on this power loss event and M0 take over. Graceful shutdown of
>> linux and other peripherals is unnecessary (and we don't have the power
>> necessary to do so).
> If graceful shutdown of Linux is not required (and is in fact
> undesireable), why is Linux involved at all in this shutdown process?
>
> For example, why is this not a secure interrupt taken to EL3, which can
> (gracefully) shut down the CPUs regardless?
Will need Pramod to explain the detailed rationale here.
>>>>       Since battery can provide limited power for a very short time hence need to
>>>>       transition to lowest power. As per the transition process , CPUs power
>>>>       domain has to be off but before that it needs to flush out its content to
>>>>       system memory(L3) so that content could be backed-up by a MCU, a controller
>>>>       consuming very less power. Since we can not afford plugging-out every
>>>>       individual CPUs in sequence hence uses  ipi_cpu_stop for all other CPUs
>>>>       which ultimately switch to ATF to flush out all the CPUs caches and comes
>>>>       out of coherency domain so that its power rails could be switched-off.
>>> If you're stopping CPUs from completely arbitrary states, what is the
>>> benefit of saving the RAM contents?
>> Some of the RAM contains data that was in the process of being written to
>> disk by the accelerator.
> Ok, so this isn't actually about backing up RAM contents; it's about
> completing pending I/O.
>
> I'm still confused as to how that works. How do you avoid leaving the
> disk in some corrupt state if data runs out partway through?

Some additional flags and details are saved to disk with the "pending i/o".

On next power up an app runs which recovers the data and recovers it and 
completes processing.

Of course, if the store doesn't succeed properly portions of the 
recovery are discarded.

>
>> This data must be saved to disk and the high power CPUs consume too much
>> power to continue performing this operation.
>>
>>> CPUs might be running with IRQs disabled for an arbitrarily long time,
>> In an embedded linux system we control everything running.
> Sure, and that complete control allows you to do something better than
> this RFC, AFAICT.
If possible that would be great.  Need Pramod to comment whether the 
direct EL3 will solve all issues.
>
> Thanks,
> Mark.

Thanks for input Mark.

Scott