lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 17 Nov 2022 13:09:25 -0500
From:   Robbie King <robbiek@...ghtlabs.com>
To:     "lihuisong (C)" <lihuisong@...wei.com>,
        Sudeep Holla <sudeep.holla@....com>
Cc:     linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org,
        rafael@...nel.org, rafael.j.wysocki@...el.com,
        wanghuiqiang@...wei.com, huangdaode@...wei.com,
        tanxiaofei@...wei.com
Subject: Re: [RFC] ACPI: PCC: Support shared interrupt for multiple subspaces

On 11/7/2022 1:24 AM, lihuisong (C) wrote:
> 
> 在 2022/11/4 23:39, Robbie King 写道:
>> On 11/4/2022 11:15 AM, Sudeep Holla wrote:
>>> On Fri, Nov 04, 2022 at 11:04:22AM -0400, Robbie King wrote:
>>>> Hello Huisong, your raising of the shared interrupt issue is very timely, I
>>>> am working to implement "Extended PCC subspaces (types 3 and 4)" using PCC
>>>> on the ARM RDN2 reference platform as a proof of concept, and encountered
>>>> this issue as well.  FWIW, I am currently testing using Sudeep's patch with
>>>> the "chan_in_use" flag removed, and so far have not encountered any issues.
>>>>
>>>
>>> Interesting, do you mean the patch I post in this thread but without the
>>> whole chan_in_use flag ?
>>
>> That's right, diff I'm running with is attached to end of message.
> Hello Robbie, In multiple subspaces scenario, there is a problem
> that OS doesn't know which channel should respond to the interrupt
> if no this chan_in_use flag. If you have not not encountered any
> issues in this case, it may be related to your register settings.
> 

Hi Huisong, apologies, I see your point now concerning multiple subspaces.

I have started stress testing where I continuously generate both requests
and notifications as quickly as possible, and unfortunately found an issue
even with the original chan_in_use patch.  I first had to modify the patch
to get the type 4 channel notifications to function at all, essentially
ignoring the chan_in_use flag for that channel.  With that change, I still
hit my original stress issue, where the pcc_mbox_irq function did not
correctly ignore an interrupt for the type 3 channel.

The issue occurs when a request from AP to SCP over the type 3 channel is
outstanding, and simultaneously the SCP initiates a notification over the
type 4 channel.  Since the two channels share an interrupt, both handlers
are invoked.

I've tried to draw out the state of the channel status "free" bits along
with the AP and SCP function calls involved.

type 3
------

  (1)pcc.c:pcc_send_data()
        |                         (5) mailbox.c:mbox_chan_receive_data()
_______v                      (4)pcc.c:pcc_mbox_irq()
free   \_________________________________________

                               ^
type 4                        ^
------                        ^
_____________________
free                 \_____________________________
                      ^        ^
                      |        |
(2)mod_smt.c:smt_transmit()   |
                               |
(3)mod_mhu2.c:raise_interrupt()

The sequence of events are:

1) OS initiates request to SCP by clearing FREE in status and ringing SCP doorbell
2) SCP initiates notification by filling shared memory and clearing FREE in status
3) SCP notifies OS by ringing OS doorbell
4) OS first invokes interrupt handler for type 3 channel

    At this step, the issue is that "val" from reading status (i.e. CommandCompleteCheck)
    is zero (SCP has not responded yet) so the code below falls through and continues
    to processes the interrupt as if the request has been acknowledged by the SCP.

	if (val) { /* Ensure GAS exists and value is non-zero */
		val &= pchan->cmd_complete.status_mask;
		if (!val)
			return IRQ_NONE;
	}

    The chan_in_use flag does not address this because the channel is indeed in use.

5) ACPI:PCC client kernel module is incorrectly notified that response data is
    available

I added the following fix (applied on top of Sudeep's original patch for clarity)
for the issue above which solved the stress test issue.  I've changed the interrupt
handler to explicitly verify that the status value matches the mask for type 3
interrupts before acknowledging them.  Conversely, a type 4 channel verifies that
the status value does *not* match the mask, since in this case we are functioning
as the recipient, not the initiator.

One concern is that since this fundamentally changes handling of the channel status,
that existing platforms could be impacted.

For reference, here are my Pcct.aslc tables:

   {
       EFI_ACPI_6_4_PCCT_SUBSPACE_TYPE_3_EXTENDED_PCC,
...
       ARM_GAS32(0x06000004ULL),          // CommandCompleteCheckRegister
       0x00000001ULL,                     // CommandCompleteCheckMask
       ARM_GAS32(0x06000004ULL),          // CommandCompleteUpdateRegister
       0xFFFFFFFEULL,                     // CommandCompleteUpdatePreserve
       0x00000000ULL,                     // CommandCompleteUpdateSet
       ARM_GAS32(0x06000004ULL),          // ErrorStatusRegister
       0x00000002ULL,                     // ErrorStatusMask
   },
   {
       EFI_ACPI_6_4_PCCT_SUBSPACE_TYPE_4_EXTENDED_PCC,
...
       ARM_GAS32(0x06000084ULL),          // CommandCompleteCheckRegister
       0x00000001ULL,                     // CommandCompleteCheckMask
       ARM_GAS32(0x06000084ULL),          // CommandCompleteUpdateRegister
       0xFFFFFFFEULL,                     // CommandCompleteUpdatePreserve
       0x00000001ULL,                     // CommandCompleteUpdateSet
       ARM_GAS32(0x06000084ULL),          // ErrorStatusRegister
       0x00000002ULL,                     // ErrorStatusMask
   },



diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c
index f8febc4f3270..a7dfcb5503ff 100644
--- a/drivers/mailbox/pcc.c
+++ b/drivers/mailbox/pcc.c
@@ -93,6 +93,8 @@ struct pcc_chan_reg {
   * @plat_irq: platform interrupt
   * @plat_irq_flags: platform interrupt flags
   * @chan_in_use: flag indicating whether the channel is in use or not
+ * @is_controller: flow of data on the channel is controlled locally
+ *       (as opposed to notifications which originate remotely)
   */
  struct pcc_chan_info {
  	struct pcc_mbox_chan chan;
@@ -104,6 +106,7 @@ struct pcc_chan_info {
  	int plat_irq;
  	unsigned int plat_irq_flags;
  	bool chan_in_use;
+	bool is_controller;
  };

  #define to_pcc_chan_info(c) container_of(c, struct pcc_chan_info, chan)
@@ -243,22 +246,32 @@ static irqreturn_t pcc_mbox_irq(int irq, void *p)
  	struct pcc_chan_info *pchan;
  	struct mbox_chan *chan = p;
  	u64 val;
+	u64 cmp;
  	int ret;

  	pchan = chan->con_priv;

-	if (!pchan->chan_in_use)
+	if (pchan->is_controller && !pchan->chan_in_use)
  		return IRQ_NONE;

  	ret = pcc_chan_reg_read(&pchan->cmd_complete, &val);
  	if (ret)
  		return IRQ_NONE;

-	if (val) { /* Ensure GAS exists and value is non-zero */
-		val &= pchan->cmd_complete.status_mask;
-		if (!val)
-			return IRQ_NONE;
-	}
+	/*
+	 * When we control data flow on the channel, we expect
+	 * to see the mask bit(s) set by the remote to indicate
+	 * the presence of a valid response.  When we do not
+	 * control the flow (i.e. type 4) the opposite is true.
+	 */
+	if (pchan->is_controller)
+		cmp = pchan->cmd_complete.status_mask;
+	else
+		cmp = 0;
+
+	val &= pchan->cmd_complete.status_mask;
+	if (cmp != val)
+		return IRQ_NONE;

  	ret = pcc_chan_reg_read(&pchan->error, &val);
  	if (ret)
@@ -704,6 +717,9 @@ static int pcc_mbox_probe(struct platform_device *pdev)
  		pcc_mbox_channels[i].con_priv = pchan;
  		pchan->chan.mchan = &pcc_mbox_channels[i];

+		pchan->is_controller =
+			(pcct_entry->type != ACPI_PCCT_TYPE_EXT_PCC_SLAVE_SUBSPACE);
+
  		if (pcct_entry->type == ACPI_PCCT_TYPE_EXT_PCC_SLAVE_SUBSPACE &&
  		    !pcc_mbox_ctrl->txdone_irq) {
  			pr_err("Plaform Interrupt flag must be set to 1");

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ