lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <VI1PR04MB70239026F32223F80F92C607EE790@VI1PR04MB7023.eurprd04.prod.outlook.com>
Date:   Wed, 6 Nov 2019 22:47:42 +0000
From:   Leonard Crestez <leonard.crestez@....com>
To:     Florian Fainelli <f.fainelli@...il.com>,
        Abel Vesa <abel.vesa@....com>
CC:     Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Aisheng Dong <aisheng.dong@....com>,
        "mark.rutland@....com" <mark.rutland@....com>,
        Jacky Bai <ping.bai@....com>,
        Anson Huang <anson.huang@....com>,
        "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
        "marc.zyngier@....com" <marc.zyngier@....com>,
        "catalin.marinas@....com" <catalin.marinas@....com>,
        "rjw@...ysocki.net" <rjw@...ysocki.net>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "will.deacon@....com" <will.deacon@....com>,
        dl-linux-imx <linux-imx@....com>,
        "kernel@...gutronix.de" <kernel@...gutronix.de>,
        "sudeep.holla@....com" <sudeep.holla@....com>,
        Fabio Estevam <fabio.estevam@....com>,
        "l.stach@...gutronix.de" <l.stach@...gutronix.de>,
        "shawnguo@...nel.org" <shawnguo@...nel.org>,
        "robh@...nel.org" <robh@...nel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [RFC 0/7] cpuidle: Add poking mechanism to support non-IPI wakeup

On 07.11.2019 00:10, Florian Fainelli wrote:
> On 11/6/19 1:31 PM, Leonard Crestez wrote:
>> On 06.11.2019 22:15, Florian Fainelli wrote:
>>> On 3/28/19 3:45 AM, Lorenzo Pieralisi wrote:
>>>> On Wed, Mar 27, 2019 at 06:40:07PM +0000, Leonard Crestez wrote:
>>>>> On Wed, 2019-03-27 at 17:45 +0000, Marc Zyngier wrote:
>>>>>> On 27/03/2019 16:06, Lucas Stach wrote:
>>>>>>> Am Mittwoch, den 27.03.2019, 15:57 +0000 schrieb Marc Zyngier:
>>>>>>>> On 27/03/2019 15:44, Lucas Stach wrote:
>>>>>>>>> Am Mittwoch, den 27.03.2019, 13:21 +0000 schrieb Abel Vesa:
>>>>>>>>>> This work is a workaround I'm looking into (more as a background task)
>>>>>>>>>> in order to add support for cpuidle on i.MX8MQ based platforms.
>>>>>>>>>>
>>>>>>>>>> The main idea here is getting around the missing GIC wake_request signal
>>>>>>>>>> (due to integration design issue) by waking up a each individual core through
>>>>>>>>>> some dedicated SW power-up bits inside the power controller (GPC) right before
>>>>>>>>>> every IPI is requested for that each individual core.
>>>>>>>>>
>>>>>>>>> Just a general comment, without going into the details of this series:
>>>>>>>>> this issue is not only affecting IPIs, but also MSIs terminated at the
>>>>>>>>> GIC. Currently MSIs are terminated at the PCIe core, but terminating
>>>>>>>>> them at the GIC is clearly preferable, as this allows assigning CPU
>>>>>>>>> affinity to individual MSIs and lowers IRQ service overhead.
>>>>>>>>>
>>>>>>>>> I'm not sure what the consequences are for upstream Linux support yet,
>>>>>>>>> but we should keep in mind that having a workaround for IPIs is only
>>>>>>>>> solving part of the issue.
>>>>>>>>
>>>>>>>> If this erratum is affecting more than just IPIs, then indeed I don't
>>>>>>>> see how this patch series solves anything.
>>>>>>>>
>>>>>>>> But the erratum documentation seems to imply that only SGIs are
>>>>>>>> affected, and goes as far as suggesting to use an external interrupt
>>>>>>>> would solve it. How comes this is not the case? Or is it that anything
>>>>>>>> directly routed to a redistributor is also affected? This would break
>>>>>>>> LPIs (and thus MSIs) and PPIs (the CPU timer, among others).
>>>>>>>
>>>>>>> Anything that isn't visible to the GPC and requires the GIC
>>>>>>> wake_request signal to behave as specified is broken by this erratum.
>>>>>>
>>>>>> I really wonder how a timer interrupt (a PPI, hence not routed through
>>>>>> the GPC) can wake up the CPU in this case. It really feels like
>>>>>> something like "program CNTV_CVAL_EL0 to expire at some later point;
>>>>>> WFI" could result in the CPU going to a deep sleep state, and not
>>>>>> wake-up at all.
>>>>>
>>>>> This is already a common issue for cpuidle implementions handled by the
>>>>> "local-timer-stop" property. imx has other timer blocks in the SOC,
>>>>> they generate SPIs which are connected to GPC.
>>>>
>>>> It is not a common issue. The tick-broadcast mechanism relies on
>>>> IPIs that are sent to specific CPUs upon timer expiry.
>>>>
>>>> If IPIs don't work for CPUs in shutdown state (which is what this patch
>>>> is fixing AFAIU), the only reason I can see how a CPU can resume from
>>>> idle on a timer expiry is the GPC waking up all cores upon the global
>>>> timer SPI; if that's the case there is precious little point in
>>>> implementing CPUidle at all - too bad people worked hard to implement
>>>> NOHZ in a power efficient manner.
>>>>
>>>>>> This would indicate that not only cpuidle is broken with this, but
>>>>>> absolutely every interrupt that is not routed through the GPC.
>>>>>
>>>>> Yes, cpuidle is broken for irqs not routed through GPC. However:
>>>>>
>>>>> * All SPIs are connected to GPC in a 1:1 mapping
>>>>> * This series deals with SGIs
>>>>> * The timer PPIs are not required; covered by local-timer-stop
>>>>> * LPIs are currently unused (I understand imx-pci uses SPI by default
>>>>> from Lucas)
>>>>>
>>>>> Anything missing?
>>>>
>>>> Yes, LPIs must be able to wake up CPUs and only the CPU for which
>>>> an IRQ is actually pending.
>>>>
>>>> >From an architectural perspective, an ARM core executing the WFI
>>>> instruction must resume execution upon an IRQ occurrence targeted
>>>> at it and that's true regardless of the idle state entered.
>>>>
>>>> Anything deviating from this behaviour is not architecture compliant.
>>>
>>> What if you enter a deeper state than WFI, which leads to the power
>>> gating of your CPU core, and you are missing the necessary hardware that
>>> should be driven from the GIC's nIRQOUT/nFIQOUT signals to automatically
>>> bring the core back on upon the GIC seeing a pending interrupt targeting
>>> that core?
>>
>> imx8mq has a secondary "GPC" block which receives SPIs and can wake the
>> cores. Do you have something similar? Because if you only have the GIC
>> then that sounds much worse: you'd have to ensure that all peripheral
>> interrupts are routed away from sleeping cores.
> 
> We have a legacy interrupt controller that receives all SPIs as well,
> and it can be used as a full replacement for the GIC (with the loss of
> nVIRQ/nFIQ) but it cannot wake-up the cores unfortunately. This is all
> custom logic, so we could have done at least wake-up based on SPIs, but
> we missed that apparently, at least we were consistent.
> 
> Out of curiosity, does your GPC somehow know the affinity of a given
> interrupt to a particular core?

Yes, if it's told by software. There are mask and status registers for 
each SPI for each core in GPC but AFAIK GIC bits are unrelated.

>> On IMX only SGIs need special treatment and a newer version just
>> replaces __smp_cross_call in a platform-specific manner:
>>
>>       https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2019%2F6%2F10%2F350&amp;data=02%7C01%7Cleonard.crestez%40nxp.com%7Ce019c8afbfff487ef72208d7630629f3%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C637086750468897744&amp;sdata=SPCLEZJU5bpTrs8vQNQ7CuBWmlF8f3uyPaNUB%2F%2BAm%2Fs%3D&amp;reserved=0
> 
> Right, because for PPIs you leverage the timer broadcast and for SPIs
> you have that GPC, so all your left are the remaining "intra GIC"
> interrupts which are SGIs.
> 
>>> Would it be acceptable in that case to "help" the platform by ensuring
>>> that there is at least one core that is not allowed to enter the deepest
>>> idle state and be able to help wake back up the others? I am asking
>>> because I am facing a similar issue to what Abel is trying to solve here
>>> with ARCH_BRCMSTB platforms which do not have the ability to have their
>>> CPU cores wake-up on their once power gated.
>>
>> Maybe you can workaround in ATF: if (last_core) wfi(); else powerdown();
> 
> Yes, that would certainly work, the biggest problem in my case is
> dealing with SPIs, since we still have no way to wake-up from those,
> other than by getting the help of another CPU that is not power gated.
> Lovely, I know.

By default irqs are only routed to core0 so maybe you could only power 
down if your core has no irqs enabled? It might even be possible to do 
this by reading GIC registers in ATF but this might race with other GIC 
manipulation from kernel.

Perhaps your workarounds could also be encapsulated into a 
platform-specific irqchip implementation which occasionally pokes at ATF.

>> But you still need special treatment for interrupts targeted at gated cores.
>>
>>>>> My understanding is that this wake request feature via GIC is new in v3
>>>>> and this is maybe why HW team missed it during integration. Older
>>>>> imx6/7 has GICv2 and has deep idle states which always rely on GPC to
>>>>> wakeup so the approach can work.
>>>>
>>>> If HW designers really wanted to have sensible power management policy
>>>> in this SoC they would have paid attention, I am against patching the
>>>> kernel heavily to fix a platform bug.
>>
>>> HW designers may not be aware of how the cpuifle framework operates or
>>> what its constraints are, so they may not understand that any interrupt,
>>> must be able to autonomously (with lack of a better name) wake-up a
>>> given core, given any idle state it has entered.
>>
>> My understanding is that this is a requirement of GICv3 architecture.
>>
> 
> The systems I use have a GICv2 architecture though this is still no
> excuse for not having hooked the nIRQOUT/nFIQOUT to a power management
> controller, this is clearly an oversight, and it should have been
> possible to automatically take a core out of power gating, since we did
> design our own power gating logic, but this was done that way. Hopefully
> future designs can remedy that, designers are aware of why this is a
> problem now.
> --
> Florian
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ