lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6e6c3406-c4e9-1195-168b-934bbc55c6c4@caviumnetworks.com>
Date:   Fri, 13 Jan 2017 11:40:35 -0800
From:   David Daney <ddaney@...iumnetworks.com>
To:     Marc Zyngier <marc.zyngier@....com>,
        Thomas Gleixner <tglx@...utronix.de>
CC:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linus Walleij <linus.walleij@...aro.org>
Subject: Re: irq domain hierarchy vs. chaining w/ PCI MSI-X...

On 01/13/2017 10:45 AM, Marc Zyngier wrote:
> On 13/01/17 17:37, David Daney wrote:
>> On 01/13/2017 08:15 AM, Marc Zyngier wrote:
>>> Thanks Linus for looping me in.
>>>
>>> On 12/01/17 22:35, David Daney wrote:
>>>> Hi Thomas,
>>>>
>>>> I am trying to figure out how to handle this situation:
>>>>
>>>>                    handle_level_irq()
>>>>                    +---------------+                 handle_fasteoi_irq()
>>>>                    | PCIe hosted   |                 +-----------+
>>>> +-----+
>>>>   --level_gpio---->| GPIO to MSI-X |--MSI_message--+>| gicv3-ITS |---> |
>>>> CPU |
>>>>                    | widget        |               | +-----------+
>>>> +-----+
>>>>                    +---------------+               |
>>>>                                                    |
>>>>            +-------------------+                   |
>>>>            | other PCIe device |---MSI_message-----+
>>>>            +-------------------+
>>>>
>>>>
>>>> The question is how to structure the interrupt handling.  My initial
>>>> attempt was a chaining arrangement where the GPIO driver does
>>>> request_irq() for the appropriate MSI-X vector, and the handler calls
>>>> back into the irq system like this:
>>>>
>>>>
>>>> static irqreturn_t thunderx_gpio_chain_handler(int irq, void *dev)
>>>> {
>>>> 	struct thunderx_irqdev *irqdev = dev;
>>>> 	int chained_irq;
>>>> 	int ret;
>>>>
>>>> 	chained_irq = irq_find_mapping(irqdev->gpio->chip.irqdomain,
>>>> 				       irqdev->line);
>>>> 	if (!chained_irq)
>>>> 		return IRQ_NONE;
>>>>
>>>> 	ret = generic_handle_irq(chained_irq);
>>>>
>>>> 	return ret ? IRQ_NONE : IRQ_HANDLED;
>>>> }
>>>>
>>>> Thus getting the proper GPIO irq_chip functions called to manage the
>>>> level triggering semantics.
>>>>
>>>> The drawbacks of this approach are that there are then two irqs
>>>> associated with the GPIO line (the base MSI-X and the chained GPIO),
>>>> also there can be up to 80-100 of these widgets, so potentially we can
>>>> consume twice that many irq numbers.
>>>>
>>>> It was suggested by Linus Walleij that using an irq domain hierarchy
>>>> might be a better idea.  However, I cannot figure out how this might
>>>> work.  The gicv3-ITS needs to use handle_fasteoi_irq(), and we need
>>>> handle_level_irq() for the GPIO-level lines.  Getting the proper
>>>> irq_chip functions called in a hierarchical configuration doesn't seem
>>>> doable given the heterogeneous flow handlers.
>>>
>>> My main worry here is that you're trying to handle something that is
>>> apparently a level interrupt using a transport that only handles edge
>>> interrupts. It means that each time you EOI an interrupt, you need to be
>>> able to resample the level and retrigger it if still high. Do you have a
>>> HW facility to do so? Or do you emulate this in SW?
>>
>> Yes.
>>
>> The first thing the handle_level_irq() flow handler does is to mask the
>> source.  After calling the handler, it then unmasks the source.
>>
>> The act of unmasking in the HW causes another MSI to be emitted if the
>> level signal is still active.  This is what we want as it ensures that
>> interrupts keep firing as long as the signal is active, even though the
>> underlying mechanism uses edge semantics MSI.
>>
>>
>>>
>>>> Can you think of a better way of structuring this than chaining from the
>>>> MSI-X handler as I outlined above?
>>>
>>> We already have similar horrors - see irq-mbigen.c which does exactly
>>> that. It creates IRQ domains spanning a bunch of MSIs allocated to that
>>> platform device. Once you have that, the rest is pretty easy.
>>>
>>> In your case, it probably requires adding the same kind of surgery so
>>> that we can create an IRQ domain from the MSIs associated with a PCIe
>>> device. Not too hard, just not there yet, and we can probably reuse some
>>> of the code that lives in platform-msi.c
>>
>> That seems too ugly.
>>
>> I would propose one of the following:
>>
>> 1) Just keep the crappy chaining more or less as I originally
>> implemented it, as it works well enough to get useful work done.
>
> I think that's the really ugly bit. It may work, but it is not something
> I wish to see in code that I'd end-up being responsible for.

This already exists in multiple places in the kernel, it is simply the 
degenerate case of N:1 interrupt chaining where N == 1.

>
>> 2) add an optional hierarchical flow handler function to struct
>> irq_data.  If populated, this flow handler would be called from
>> handle_fasteoi_irq() instead of calling handle_irq_event().  This could
>> allow each irq_chip to have its own flow handler, instead of a single
>> flow handler shared by each of the hierarchically nested irq_chip.
>
> So you want to generalize CONFIG_IRQ_PREFLOW_FASTEOI so that it is on
> each level of a domain stack? Humm. I personally think that this is a
> massive bloat that is going to impact all the hot paths for no gain
> whatsoever, but I'll let tglx speak his mind on that.

I think "massive" goes a little overboard in the superlatives 
department.  It is a single load from a data structure that is likely 
already in the data cache, and a conditional branch that may put a bit 
of additional pressure in any branch prediction units.

>
> I still think that being able to create an irqdomain based on the
> interrupts allocated to a device and make that an irqchip is
> conceptually simpler, already exists in the kernel, and fits the
> existing infrastructure that has been put in place over the past two years.
>

The (not yet fully developed) idea, doesn't discard the irqdomain 
hierarchy.  It is really orthogonal to any discussions pertaining to 
irqdomains.

The idea is simply that irq flow handler functions and struct irq_chip 
are tightly coupled entities.  irq_chip abstracts away the mechanics of 
manipulating the HW irq unit, but a particular irq_chip instance is 
always designed with the knowledge of which flow handler will be calling 
the methods.  In general, a given irq_chip will only be usable with a 
subset of the existing flow handlers.

With the advent of irqdomain hierarchies, we now have multiple irq_chips 
being used in conjunction with the *single* flow handler for an irq.  I 
think this is the essence of my problem.  The high level idea here is to 
have a per struct irq_chip flow handlers.  The exact placement of the 
hooks to achieve this is still in the conceptual development stage.

Really I envision what should be done slightly more like "chaining" 
through a list of irq_chips, where we would follow a chain (list) of 
child irq_chip pointers from the CPU irq out to the leaf (GPIO unit), 
which is almost the opposite of the irqdomain hierarchy which follows 
parent pointers from the leaf to the CPU.  There could still be an 
irqdomain associated with each position in the chain, but the order of 
traversing the irq_chips would be reversed.

Thinking about it just a little more, this is somewhat similar to the 
preflow_handler().

At the point where the handle_*_irq() functions call handle_irq_event(), 
we need to 9optionally) do something both immediately before and after 
the call to handle_irq_event().

In irq_chip add a function:

     void (*irq_handle)(struct irq_data *data, struct irq_desc *desc);

Really this is the per irq_chip flow handler.

Then in handle_fasteoi_irq() and probably the other flow handlers as well:

    .
    .
    .
    if (chip->irq_handle)
       chip->irq_handle(&desc->irq_data, desc);
    else
       handle_irq_event(desc);
    .
    .
    .

Those 4 lines of code could be factored out into a helper function in chip.c


> It is also worth mentioning that you are already using this exact
> hierarchical infrastructure (PCI/MSI -> ITS -> GIC), so I really don't
> see why we should all of a sudden treat your particular device any
> differently.
>
> Thanks,
>
> 	M.
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ