lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <182c4d7b-9e91-c00e-43ab-a2c0bd671828@amd.com>
Date:   Fri, 12 May 2023 19:50:31 +0530
From:   Nipun Gupta <nipun.gupta@....com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "maz@...nel.org" <maz@...nel.org>, "jgg@...pe.ca" <jgg@...pe.ca>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Cc:     "git (AMD-Xilinx)" <git@....com>,
        "Anand, Harpreet" <harpreet.anand@....com>,
        "Jansen Van Vuuren, Pieter" <pieter.jansen-van-vuuren@....com>,
        "Agarwal, Nikhil" <nikhil.agarwal@....com>,
        "Simek, Michal" <michal.simek@....com>,
        "Gangurde, Abhijit" <abhijit.gangurde@....com>,
        "Cascon, Pablo" <pablo.cascon@....com>
Subject: Re: [PATCH] cdx: add MSI support for CDX bus



On 5/11/2023 3:59 AM, Thomas Gleixner wrote:
> 
> Nipun!
> 
> On Wed, May 10 2023 at 19:34, Nipun Gupta wrote:
>> On 5/10/2023 3:31 AM, Thomas Gleixner wrote:
>>> I'm not insisting on that, but you could at least have had the courtesy
>>> of responding to my review reply and explain to me why you want to solve
>>> it differently and why my suggestion is not the right solution.
>>>
>>> Alternatively you could have added that information in the changelog or
>>> cover letter.
>>>
>>> So in summary you ignored _all_ review comments I made, went off and did
>>> something different and provided a slightly different useless changelog
>>> with the extra add on of a broken Signed-off-by chain.
>>>
>>> Feel free to ignore my reviews and the documentation which we put out
>>> there to make collaboration feasible for both sides, but please don't be
>>> upset when I ignore you and your patches in return.
>>
>> Sincere apology for not responding to the earlier comments. Intention
>> was never to ignore the review comments. Appreciate your vast changes
>> regarding the MSI, and the patch series you shared took time to
>> understand (provided other things as well), and it was quite late to
>> reply. I understand that even in this case atleast I should have added
>> this as part of the cover-letter.
> 
> Fair enough. All settled.
> 
>> IMHO, use-case for MSI in CDX subsystem is a bit different from per
>> device MSI domain. Here we are trying to create a domain per CDX
>> controller which is attached to a MSI controller, and all devices on a
>> particular CDX controller will have same mechanism of write MSI
>> message.
> 
> That was exactly the same assumption which PCI/MSI and other MSI
> implementations made. It turned out to be the wrong abstraction.
> 
> CDX is not any different than PCI. The actual "interrupt chip" is not
> part of the bus, it's part of the device and pretending that it is a bus
> specific thing is just running in to the same cul-de-sac sooner than
> later.

I understand your viewpoint, but would state that CDX bus is somewhat 
different than PCI in the sense that firmware is a controller for
all the devices and their configuration. CDX bus controller sends all 
the write_msi_msg commands to firmware running on RPU over the RPmsg and 
it is the firmware which interfaces with actual devices to pass this 
information to devices in a way agreed between firmware and device. The 
only way to pass MSI information to device is via firmware and CDX bus 
controller is only entity which can communicate with the firmware for this.

> 
>> Also, the current CDX controller that we have added has a different
>> mechanism for MSI prepare (it gets requester ID from firmware).
> 
> That's not an argument, that's just an implementation detail.
> 
>> In your opinion is there any advantage in moving to a per device domain
>> for CDX devices? We can definitely rethink the implementation of MSI in
>> CDX subsystem.
> 
> See above.
> 
> While talking about implementation and design. I actually got curious
> and looked at CDX because I was amazed about the gazillion indirections
> in that msi_write_msg() callback.
> 
> So this ends up doing:
> 
>     cdx->ops->dev_configure(cdx, ...)
>       cdx_configure_device()
>         cdx_mcdi_write_msi()
>           cdx_mcdi_rpc_async()
>             kmalloc()                            <- FAIL #1
>             cdx_mcdi_rpc_async_internal()
>                queue_work()                      <- FAIL #2
> 
> #1) That kmalloc() uses GFP_ATOMIC, but this is invoked deep in the guts
>      of interrupt handling with locks held and interrupts disabled.
> 
>      Aside of the fact that this breaks on PREEMPT_RT, such allocations
>      are generally frowned upon. As a consequence the kref_put()s in the
>      error paths of cdx_mcdi_rpc_async_internal() will blow up on RT
>      too.
> 
>      I know that Xilinx stated publicly that they don't support RT, but
>      RT is not that far out to be supported in mainline and aside of that
>      I know for sure that quite a lot of Xilinx customers use PREEMPT_RT
>      nevertheless.
> 
> #2) That's actually the worse part of it and completely broken versus
>      device setup
> 
>      probe()
>        cdx_msi_domain_alloc_irqs()
>        ...
>        request_irq() {
>          ...
>          irq_activate()
>            irq_chip_write_msi_msg()
>              ...
>              queue_work()
>            ...
>        }
> 
>        enable_irq_in_device()
> 
>          <- device raises interrupt and eventually uses an uninitialized
>             MSI message because the scheduled work has not yet completed.
> 
>      That's going to be a nightmare to debug and it's going to happen
>      once in a blue moon out in the field.
> 
> The interrupt subsystem already can handle update mechanisms which
> require sleepable context:
> 
>     irq_bus_lock() and irq_bus_sync_unlock() irqchip callbacks
> 
> They were initially implemented to deal with interrupt chips which are
> configured via I2C, SPI etc.
> 
> How does that work?
> 
> On entry to interrupt management functions the sequence is:
> 
>      if (desc->irq_data.chip->irq_bus_lock)
>         desc->irq_data.chip->irq_bus_lock(...)
>      raw_spin_lock_irq(&desc->lock);
> 
> and on exit:
> 
>      raw_spin_unlock_irq(&desc->lock);
>      if (desc->irq_data.chip->irq_bus_sync_unlock)
>         desc->irq_data.chip->irq_bus_sync_unlock(...)
> 
> irq_bus_lock() usually just acquires a mutex.
> 
> The other irqchip callbacks just cache the relevant information, but do
> not execute the bus transaction because that is not possible with
> desc->lock held.
> 
> In the irq_bus_sync_unlock() they execute the bus transaction with the
> cached information before dropping the mutex.
> 
> So you can solve #1 and #2 with that. Your msi_write_msg() callback will
> just save the message and set some internal flag that it needs to be
> written out in the irq_bus_sync_unlock() callback.
> 
> See?
> 
> IIRC, there is a gap vs. interrupt affinity setting from user space,
> which is irrelevant for I2C, SPI etc. configured interrupt chips as they
> raise interrupt via an SoC interrupt pin and that's the entity which
> does the affinity management w/o requiring I2C/SPI. IIRC I posted a
> patch snippet to that effect in one of those lengthy PCI/MSI/IMS threads
> because that is also required for MSI storage which happens to be in
> queue memory and needs to be synchronized via some command channel. But
> I can't be bothered to search for it as it's a no-brainer to fix that
> up.

Thanks for this analysis and pointing the hidden crucial issues with the 
implementation. These needs to be fixed.

As per your suggestion, we can add Firmware interaction code in the 
irq_bus_sync_xx APIs. Another option is to change the 
cdx_mcdi_rpc_async() API to atomic synchronous API. We are evaluating 
both the solutions and will update the implementation accordingly.

Thanks,
Nipun

> 
> Thanks,
> 
>          tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ