lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a0b0cbfb-6a43-db68-3c0c-c5b1c498c3f4@huawei.com>
Date: Thu, 6 Mar 2025 11:44:56 +0800
From: "lihuisong (C)" <lihuisong@...wei.com>
To: Sudeep Holla <sudeep.holla@....com>
CC: <linux-acpi@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Jassi Brar
	<jassisinghbrar@...il.com>, Adam Young <admiyo@...amperecomputing.com>,
	Robbie King <robbiek@...ghtlabs.com>
Subject: Re: [PATCH 02/14] mailbox: pcc: Always clear the platform ack
 interrupt first


在 2025/3/5 22:29, Sudeep Holla 写道:
> On Wed, Mar 05, 2025 at 11:45:35AM +0800, lihuisong (C) wrote:
>> 在 2025/3/3 18:51, Sudeep Holla 写道:
>>> The PCC mailbox interrupt handler (pcc_mbox_irq()) currently checks
>>> for command completion flags and any error status before clearing the
>>> interrupt.
>>>
>>> The below sequence highlights an issue in the handling of PCC mailbox
>>> interrupts, specifically when dealing with doorbell notifications and
>>> acknowledgment between the OSPM and the platform where type3 and type4
>>> channels are sharing the interrupt.
>>>
>>>           Platform Firmware              OSPM/Linux PCC driver
>>> ------------------------------------------------------------------------
>>>                                        build message in shmem
>>>                                        ring type3 channel doorbell
>>> receives the doorbell interrupt
>>>     process the message from OSPM
>>>     build response for the message
>>> ring the platform ack interrupt to OSPM
>>> 				--->
>>> build notification in type4 channel
>>>                                        start processing in pcc_mbox_irq()
>>>                                         enter pcc handler for type4 chan
>>>                                            command complete cleared
>>> 			        	 read the notification
>>>                                   <---     clear platform ack irq
>>>     		* no effect from above as platform ack irq *
>>> 		* not yet triggered on this channel *
>>> ring the platform ack irq on type4 channel
>>> 				--->
>>>                                         leave pcc handler for type4 chan
>>>                                         enter pcc handler for type3 chan
>>>                                            command complete set
>>> 					 read the response
>>>                                   <---     clear platform ack irq
>>>                                         leave pcc handler for type3 chan
>>>                                        leave pcc_mbox_irq() handler
>>>                                        start processing in pcc_mbox_irq()
>>>                                         enter pcc handler for type4 chan
>>>                                         leave pcc handler for type4 chan
>>>                                         enter pcc handler for type3 chan
>>>                                         leave pcc handler for type3 chan
>>>                                        leave pcc_mbox_irq() handler
>> This is not easy to understand to me.
>> The issue as below described is already very clear to me.
>> So suggest remove above flow graph.
> I understood it with the graph similar to the one above, though I simplified
> it in terms of PCC rather than specific IP reference.
>
>>> The key issue occurs when OSPM tries to acknowledge platform ack
>>> interrupt for a notification which is ready to be read and processed
>>> but the interrupt itself is not yet triggered by the platform.
>>>
>>> This ineffective acknowledgment leads to an issue later in time where
>>> the interrupt remains pending as we exit the interrupt handler without
>>> clearing the platform ack interrupt as there is no pending response or
>>> notification. The interrupt acknowledgment order is incorrect.
>>>
>> Has this issue been confired? It's more better if has the log.😁
>> But it seems a valid issue.
> Yes Robbie reported this. He is away and can't test or respond until next
> week. The log just says there was loads of spurious interrupts and nobody
> cared log as you got in the first patch of yours fixing similar race.
Yeah
>
>>> To resolve this issue, the platform acknowledgment interrupt should
>>> always be cleared before processing the interrupt for any notifications
>>> or response.
>>>
>> AFAIC,always clearing the platform ack interrupt first which is also the
>> communication flow as ACPI spec described.
> Indeed, not sure how we missed it so far.
>
>> I am not sure if it is ok when triggering interrupt and clearing interrupt
>> occur concurrently.
> Should be OK as we start clearing all the channels that share, if the
> handler doesn't clear any source, the interrupt must remain asserted.
ok, thank you for clarifying to me.
>
>> But this scenario is always possible. I think It doesn't matter with this
>> patch. It's just my confusion.
> Indeed, it can happen any time as you mentioned. No worries better to ask
> and clarify than assume. Thanks for your time and review.
>
> --
> Regards,
> Sudeep
>
>
> .

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ