[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221021162055.nuxvfdrfhv42nlim@offworld>
Date: Fri, 21 Oct 2022 09:20:55 -0700
From: Davidlohr Bueso <dave@...olabs.net>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc: Ira Weiny <ira.weiny@...el.com>, dan.j.williams@...el.com,
dave.jiang@...el.com, alison.schofield@...el.com,
bwidawsk@...nel.org, vishal.l.verma@...el.com,
a.manzanares@...sung.com, linux-cxl@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] cxl/pci: Add generic MSI-X/MSI irq support
On Fri, 21 Oct 2022, Jonathan Cameron wrote:
>On Thu, 20 Oct 2022 21:18:58 -0700
>Ira Weiny <ira.weiny@...el.com> wrote:
>
>> On Thu, Oct 20, 2022 at 03:31:25PM -0700, Davidlohr Bueso wrote:
>> > On Tue, 18 Oct 2022, Jonathan Cameron wrote:
>> >
>> > > Reality is that it is cleaner to more or less ignore the infrastructure
>> > > proposed in this patch.
>> > >
>> > > 1. Query how many CPMU devices there are. Whilst there stash the maximim
>> > > cpmu vector number in the cxlds.
>> > > 2. Run a stub in this infrastructure that does max(irq, cxlds->irq_num);
>> > > 3. Carry on as before.
>> > >
>> > > Thus destroying the point of this infrastructure for that usecase at least
>> > > and leaving an extra bit of state in the cxl_dev_state that is just
>> > > to squirt a value into the callback...
>> >
>> > If it doesn't fit, then it doesn't fit.
>> >
>> > However, while I was expecting pass one to be in the callback, I wasn't
>> > expecting that both pass 1 and 2 shared the cpmu_regs_array. If the array
>> > could be reconstructed during pass 2, then it would fit a bit better;
>> > albeit the extra allocation, cycles etc., but this is probing phase, so
>> > overhead isn't that important (and cpmu_count isn't big enough to matter).
>
>I thought about that approach, but it's really ugly to have to do
>
>1) For the IRQ number gathering.
> a) Parse 1 to count CPMUs
> b) Parse 2 to get the register maps - grab the irq numbers and unmap them again
>2) For the CPMU registration
> a) Parse 3 to count CPMUs (we could stash the number of CPMUS form 1a) but
> that's no advantage over stashing the max irq in current proposal.
> Both are putting state where it's not relevant or wanted just to make it
> available in a callback. This way is even worse because it's getting
> stashed as a side effect of a parse in a function doing something different.
> b) Parse 4 to get the register maps and actually create the devices. Could have
> stashed this earlier as well, but same 'side effects' argument applies.
>
>Sure, can move to this however with appropriate comments on why we are playing
>these games because otherwise I suspect a future 'cleanup' would remove double, double
>pass.
>
>To allow for an irq registration wrapper that turns a series of straight
>line calls into callbacks in an array. The straight line calls aren't exactly
>complex in the first place.
>//find cpmu filling in cxl_cpmu_reg_maps.
>
>max_irq = -1
>rc = cxl_mailbox_get_irq()
>if (rc < 0)
> return rc;
>max_irq = max(max_irq, rc);
>
>rc = cxl_events_get_irq()
>if (rc < 0)
> return rc;
>max_irq = max(max_irq, rc);
>
>rc = cxl_cpmus_get_irq(cxl_cpmu_reg_maps);
>if (rc < 0)
> return rc;
>max_irq = max(max_irq, rC);
>
>...
>
>if (irq > 0) {
>
> pci_get...
>}
>
>//create all the devices...
Yes, this was sort of what I pictured if we go this way. It doesn't make
my eyes sore.
>
>> >
>> > But if we're going to go with a free-for-all approach, can we establish
>> > who goes for the initial pci_alloc_irq_vectors()? I think perhaps mbox
>> > since it's the most straightforward and with least requirements, I'm
>> > also unsure of the status yet to merge events and pmu, but regardless
>> > they are still larger patchsets. If folks agree I can send a new mbox-only
>> > patch.
>>
>> I think there needs to be some mechanism for all of the sub-device-functions to
>> report their max required vectors.
>>
>> I don't think that the mbox code is necessarily the code which should need to
>> know about all those other sub-device-thingys. But it could certainly take
>> some 'max vectors' value that probe passed to it.
>>
>> I'm still not sure how dropping this infrastructure makes Jonathan's code
>> cleaner. I still think there will need to be 2 passes over the number of
>> CPMU's.
>>
>
>Primarily that there is no need to stash anything about the CPMUs in the
>cxl_device_state (option 1) or repeat all the counting and discovery logic twice
>(option 2).
>
>I can live with it (it's what we have to do in pcie port for the equivalent)
>but the wrapped up version feels like a false optimization.
>
>Saves a few lines of code and adds a bunch of complexity elsewhere that looks to
>me to outweigh that saving.
Yeah it's hard to justify the extra complexity here when the alternative isn't
even that bad.
Thanks,
Davidlohr
Powered by blists - more mailing lists