[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <996854e7-ab11-b76b-64dc-b760b3ad2365@intel.com>
Date: Tue, 11 Aug 2020 11:39:21 -0700
From: "Dey, Megha" <megha.dey@...el.com>
To: Thomas Gleixner <tglx@...utronix.de>,
Jason Gunthorpe <jgg@...dia.com>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>
CC: Marc Zyngier <maz@...nel.org>,
"Jiang, Dave" <dave.jiang@...el.com>,
"vkoul@...nel.org" <vkoul@...nel.org>,
"bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"rafael@...nel.org" <rafael@...nel.org>,
"hpa@...or.com" <hpa@...or.com>,
"alex.williamson@...hat.com" <alex.williamson@...hat.com>,
"Pan, Jacob jun" <jacob.jun.pan@...el.com>,
"Raj, Ashok" <ashok.raj@...el.com>,
"Liu, Yi L" <yi.l.liu@...el.com>, "Lu, Baolu" <baolu.lu@...el.com>,
"Tian, Kevin" <kevin.tian@...el.com>,
"Kumar, Sanjay K" <sanjay.k.kumar@...el.com>,
"Luck, Tony" <tony.luck@...el.com>,
"Lin, Jing" <jing.lin@...el.com>,
"Williams, Dan J" <dan.j.williams@...el.com>,
"kwankhede@...dia.com" <kwankhede@...dia.com>,
"eric.auger@...hat.com" <eric.auger@...hat.com>,
"parav@...lanox.com" <parav@...lanox.com>,
"Hansen, Dave" <dave.hansen@...el.com>,
"netanelg@...lanox.com" <netanelg@...lanox.com>,
"shahafs@...lanox.com" <shahafs@...lanox.com>,
"yan.y.zhao@...ux.intel.com" <yan.y.zhao@...ux.intel.com>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"Ortiz, Samuel" <samuel.ortiz@...el.com>,
"Hossain, Mona" <mona.hossain@...el.com>,
"dmaengine@...r.kernel.org" <dmaengine@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: Re: [PATCH RFC v2 02/18] irq/dev-msi: Add support for a new DEV_MSI
irq domain
Hi Thomas,
On 8/8/2020 12:47 PM, Thomas Gleixner wrote:
> Megha,
>
> "Dey, Megha" <megha.dey@...el.com> writes:
>> On 8/7/2020 9:47 AM, Thomas Gleixner wrote:
>>> I'm all for sharing code and making the life of driver writers simple
>>> because that makes my life simple as well, but not by creating a layer
>>> at the wrong level and then hacking it into submission until it finally
>>> collapses.
>>>
>>> Designing the infrastructure following the clear layering rules of
>>> hierarchical domains so it works for IMS and also replaces the platform
>>> MSI hack is the only sane way to go forward, not the other way round.
>> From what I've gathered, I need to:
>>
>> 1. Get rid of the mantra that "IMS" is an extension of platform-msi.
>> 2. Make this new infra devoid of any platform-msi references
> See below.
ok..
>
>> 3. Come up with a ground up approach which adheres to the layering
>> constraints of the IRQ subsystem
> Yes. It's something which can be used by all devices which have:
>
> 1) A device specific irq chip implementation including a msi write function
> 2) Device specific resource management (slots in the IMS case)
>
> The infrastructure you need is basically a wrapper around the core MSI
> domain (similar to PCI, platform-MSI etc,) which provides the specific
> functionality to handle the above.
ok, i will create a per device irq chip which will directly have the
device specific callbacks instead of another layer of redirection.
This way i will get rid of the 'platform_msi_ops' data structure.
I am not sure what you mean by device specific resource management, are
you referring to dev_msi_alloc/free_irqs?
>> 4. Have common code (drivers/irqchip maybe??) where we put in all the
>> generic ims-specific bits for the IRQ chip and domain
>> which can be used by all device drivers belonging to this "IMS"class.
> Yes, you can provide a common implementation for devices which share the
> same irq chip and domain (slot management functionality)
yeah i think most of the msi_domain_ops (msi_prepare, set_desc etc) are
common and can be moved into a common file.
>
>> 5. Have the device driver do the rest:
>> create the chip/domain (one chip/domain per device?)
>> provide device specific callbacks for masking, unmasking, write
>> message
> Correct, but you don't need any magic new data structures for that, the
> existing msi_domain_info/msi_domain_ops and related structures are
> either sufficient or can be extended when necessary.
>
> So for the IDXD case you need:
>
> 1) An irq chip with mask/unmask callbacks and a write msg function
> 2) A slot allocation or association function and their 'free'
> counterpart (irq_domain_ops)
This is one part I didn't understand.
Currently my dev_msi_alloc_irqs is simply a wrapper over
platform_msi_domain_alloc_irqs which again mostly calls
msi_domain_alloc_irqs.
When you say add a .alloc, .free, does this mean we should add a device
specific alloc/free and not use the default
msi_domain_alloc/msi_domain_free?
I don't see anything device specific to be done for IDXD atleast, can
you please let me know?
>
> The function and struct pointers go into the appropriate
> msi_info/msi_ops structures along with the correct flags to set up the
> whole thing and then the infrastructure creates your domain, fills in
> the shared functions and sets the whole thing up.
>
> That's all what a device driver needs to provide, i.e. stick the device
> specific functionality into right data structures and let the common
> infrastructure deal with it. The rest just works and the device specific
> functions are invoked from the right places when required.
yeah. makes sense..
>
>> So from the hierarchical domain standpoint, we will have:
>> - For DSA device: vector->intel-IR->IDXD
>> - For Jason's device: root domain-> domain A-> Jason's device's IRQ domain
>> - For any other intel IMS device in the future which
>> does not require interrupt remapping: vector->new device IRQ domain
>> requires interrupt remapping: vector->intel-IR->new device IRQ
>> domain (i.e. create a new domain even though IDXD is already present?)
> What's special about IDXD? It's just one specific implementation of IMS
> and any other device implementing IMS is completely independent and as
> documented in the specification the IMS slot management and therefore
> the mask/unmask functionality can and will be completely different. IDXD
> has a storage array with slots, Jason's hardware puts the IMS slot into
> the queue storage.
>
> It does not matter whether a device comes from Intel or any other vendor,
> it does neither matter whether the device works with direct vector
> delivery or interrupt remapping.
>
> IDXD is not any different from any other IMS capable device when you
> look at it from the interrupt hierarchy. It's either:
>
> vector -> IR -> device
> or
> vector -> device
>
> The only point where this is differentiated is when the irq domain is
> created. Anything else just falls into place.
yeah, so I will create the IRQ domain in the IDXD driver with INTEL-IR
as the parent, instead of creating a common per IR unit domain
>
> To answer Jason's question: No, the parent is never the PCI/MSI irq
> domain because that sits at the same level as that device
> domain. Remember the scheme:
>
> vector --- DMAR-MSI
> |
> |-- ....
> |
> |-- IR-0 --- IO/APIC-0
> | |
> | |-- IO/APIC-1
> | |
> | |-- PCI/MSI-0
> | |
> | |-- HPET/MSI-0
> | |
> | |-- DEV-A/MSI-0
> | |-- DEV-A/MSI-1
> | |-- DEV-B/MSI-2
> |
> |-- IR-1 --- PCI/MSI-1
> | |
> | |-- DEV-C/MSI-3
>
> The PCI/MSI domain(s) are dealing solely with PCI standard compliant
> MSI/MSI-X. IMS or similar (platform-MSI being one variant) sit at the
> same level as the PCI/MSI domains.
>
> Why? It's how the hardware operates.
>
> The PCI/MSI "irq chip" is configured by the PCI/MSI domain level and it
> sends its message to the interrupt parent in the hierarchy, i.e. either
> to the Interrupt Remap unit or to the configured vector of the target
> CPU.
>
> IMS does not send it to some magic PCI layer first at least not at the
> conceptual level. The fact that the message is transported by PCIe does
> not change that at all. PCIe in that case is solely the transport, but
> the "irq chip" at the PCI/MSI level of the device is not involved at
> all. If it were that would be a different story.
>
> So now you might ask why we have a single PCI/MSI domain per IR unit and
> why I want seperate IMS domains.
>
> The answer is in the hardware again. PCI/MSI is uniform accross devices
> so the irq chip and all of the domain functionality can be shared. But
> then we have two PCI/MSI domains in the above example because again the
> hardware has one connected to IR unit 0 and the other to IR unit 1.
> IR 0 and IR 1 manage different resources (remap tables) so PCI/MSI-0
> depends on IR-0 and PCI/MSI-1 on IR-1 which is reflected in the
> parent/child relation ship of the domains.
>
> There is another reason why we can spawn a single PCI/MSI domain per
> root complex / IR unit. The PCI/MSI domains are not doing any resource
> management at all. The resulting message is created from the allocated
> vector (direct CPU delivery) or from the allocated Interrupt remapping
> slot information. The domain just deals with the logic required to
> handle PCI/MSI(X) and the necessary resources are provided by the parent
> interrupt layers.
>
> IMS is different. It needs device specific resource management to
> allocate an IMS slot which is clearly part of the "irq chip" management
> layer, aka. irq domain. If the IMS slot management would happen in a
> global or per IR unit table and as a consequence the management, layout,
> mask/unmask operations would be uniform then an IMS domain per system or
> IR unit would be the right choice, but that's not how the hardware is
> specified and implemented.
>
> Now coming back to platform MSI. The way it looks is:
>
> CPU --- (IR) ---- PLATFORM-MSI --- PLATFORM-DEVICE-MSI-0
> |-- PLATFORM-DEVICE-MSI-1
> |...
>
> PLATFORM-MSI is a common resource management which also provides a
> shared interrupt chip which operates at the PLATFORM-MSI level with one
> exception:
>
> The irq_msi_write_msg() callback has an indirection so the actual
> devices can provide their device specific msi_write_msg() function.
>
> That's a borderline abuse of the hierarchy, but it makes sense to some
> extent as the actual PLATFORM-MSI domain is a truly shared resource and
> the only device specific functionality required is the message
> write. But that message write is not something which has it's own
> resource management, it's just a non uniform storage accessor. IOW, the
> underlying PLATFORM-MSI domain does all resource management including
> message creation and the quirk allows to write the message in the device
> specific way. Not that I love it, but ...
>
> That is the main difference between platform MSI and IMS. IMS is
> completely non uniform and the devices do not share any common resource
> or chip functionality. Each device has its own message store management,
> slot allocation/assignment and a device specifc interrupt chip
> functionality which goes way beyond the nasty write msg quirk.
Thanks for giving such a detailed explanation! really helps :)
>
>> What I still don't understand fully is what if all the IMS devices
>> need the same domain ops and chip callbacks, we will be creating
>> various instances of the same IRQ chip and domain right? Is that ok?
> Why would it be not ok? Are you really worried about a few hundred bytes
> of memory required for this?
>
> Sharing an instance only makes sense if the instance handles a shared or
> uniform resource space, which is clearly not the case with IMS.
>
> We create several PCI/MSI domains and several IO/APIC domains on larger
> systems. They all share the code, but they are dealing with seperate
> resources so they have seperate storage.
ok, got it ..
>
>> Currently the creation of the IRQ domain happens at the IR level so that
>> we can reuse the same domain but if it advisable to have a per device
>> interrupt domain, I will shift this to the device driver.
> Again. Look at the layering. What you created now is a pseudo shared
> domain which needs
>
> 1) An indirection layer for providing device specific functions
>
> 2) An extra allocation layer in the device specific driver to assign
> IMS slots completely outside of the domain allocation mechanism.
hmmm, again I am not sure of which extra allocation layer you are
referring to..
>
> In other words you try to make things which are neither uniform nor
> share a resource space look the same way. That's the "all I have is a
> hammer so everything is a nail" approach. That never worked out well.
>
> With a per device domain/chip approach you get one consistent domain
> per device which provides
>
> 1) The device specific resource management (i.e. slot allocation
> becomes part of the irq domain operations)
>
> 2) The device specific irq chip functions at the correct point in the
> layering without the horrid indirections
>
> 3) Consolidated data storage at the device level where the actual
> data is managed.
>
> This is of course sharing as much code as possible with the MSI core
> implementation.
>
> As a side effect any extension of this be it on the domain or the irq
> chip side is just a matter of adding the functionality to that
> particular incarnation and not by having yet another indirection
> logic at the wrong place.
>
> The price you pay is a bit of memory but you get a clean layering and
> seperation of functionality as a reward. The amount of code in the
> actual IMS device driver is not going to be much more than with the
> approach you have now.
>
> The infrastructure itself is not more than a thin wrapper around the
> existing msi domain infrastructure and might even share code with
> platform-msi.
From your explanation:
In the device driver:
static const struct irq_domain_ops idxd_irq_domain_ops = {
.alloc= idxd_domain_alloc, //not sure what this should do
.free= idxd_domain_free,
};
struct irq_chip idxd_irq_chip = {
.name= "idxd"
.irq_mask= idxd_irq_mask,
.irq_unmask= idxd_irq_unmask,
.irq_write_msg = idxd_irq_write_msg,
.irq_ack= irq_chip_ack_parent,
.irq_retrigger= irq_chip_retrigger_hierarchy,
.flags= IRQCHIP_SKIP_SET_WAKE,
}
struct msi_domain_info idxd_domain_info = {
.flags =MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS,
.ops =&dev_msi_domain_ops,//can be common
.chip =&idxd_irq_chip //per device
.handler= handle_edge_irq,
.handler_name = "edge",
}
dev->msi_domain = dev_msi_create_irq_domain(iommu->ir_domain,
idxd_domain_info, idxd_irq_domain_ops)
msi_domain_alloc_irqs(dev->msi_domain, dev, nvec);
Common code:
struct irq_domain *dev_msi_create_irq_domain(struct irq_domain *parent,
struct msi_domain_info *dev_msi_domain_info,
struct irq_domain_ops dev_msi_irq_domain_ops)
{
struct irq_domain *domain;
.......
domain = irq_domain_create_hierarchy(parent, IRQ_DOMAIN_FLAG_MSI, 0,
NULL, &dev_msi_irq_domain_ops, info);
.......
return domain;
}
static struct msi_domain_ops dev_msi_domain_ops = {
.set_desc= dev_msi_set_desc,
.msi_prepare= dev_msi_prepare,
.get_hwirq= dev_msi_get_hwirq,
}; // can re-use platform-msi data structures
except the alloc/free irq_domain_ops, does this look fine to you?
-Megha
>
> Thanks,
>
> tglx
Powered by blists - more mailing lists