lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87a6vkayzl.fsf@nanos.tec.linutronix.de>
Date:   Sat, 14 Nov 2020 13:45:18 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Alexey Kardashevskiy <aik@...abs.ru>, linux-kernel@...r.kernel.org
Cc:     Alexey Kardashevskiy <aik@...abs.ru>,
        Marc Zyngier <maz@...nel.org>,
        Cédric Le Goater <clg@...d.org>,
        Michael Ellerman <mpe@...erman.id.au>, Qian Cai <cai@....pw>,
        Rob Herring <robh@...nel.org>,
        Frederic Barrat <fbarrat@...ux.ibm.com>,
        Michal Suchánek <msuchanek@...e.de>
Subject: Re: [PATCH kernel v3] genirq/irqdomain: Add reference counting to IRQs

On Mon, Nov 09 2020 at 20:46, Alexey Kardashevskiy wrote:
> PCI devices share 4 legacy INTx interrupts from the same PCI host bridge.
> Device drivers map/unmap hardware interrupts via irq_create_mapping()/
> irq_dispose_mapping(). The problem with that these interrupts are
> shared and when performing hot unplug, we need to unmap the interrupt
> only when the last device is released.
>
> There was a comment about whether hierarchical IRQ domains should
> contribute to this reference counter and I need some help here as
> I cannot see why.
> It is reverse now - IRQs contribute to domain->mapcount and
> irq_domain_associate/irq_domain_disassociate take necessary steps to
> keep this counter in order.

I'm not yet convinced that this change is correct under all
circumstances. See below.

> What might be missing is that if we have cascade of IRQs (as in the
> IOAPIC example from Documentation/core-api/irq/irq-domain.rst ), then
> a parent IRQ should contribute to the children IRQs and it is up to

No. Hierarchical irq domains handle ONE interrupt and have nothing to do
with parent/child interrupts. They represent the various layers of
hardware involved in delivering one particular interrupt. Just looking
at this example:

  Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU

There are 3 interrupt chips involved: IOAPIC - REMAP - APIC and each of
them has to do configuration and/or resource allocation when setting up
an interrupt. Also during handling the various chip levels might be
involved.

So now if you remove interrupt remapping (config, commandline, lack of
HW) the delivery chain looks like this:

  Device --> IOAPIC -> Local APIC -> CPU

So we only have two chips involved.

Hierarchical domains handle that at boot time by associating the
relevant parent domains to the involved chips.

> Documentation/core-api/irq/irq-domain.rst also suggests there is a lot
> to see in debugfs about IRQs but on my thinkpad there nothing about
> hierarchy.

Enable GENERIC_IRQ_DEBUGFS and surf around in
/sys/kernel/debug/irq/domains.

cat /sys/kernel/debug/irq/domains/IO-APIC-240 
name:   IO-APIC-240
 size:   24
 mapped: 15
 flags:  0x00000003
 parent: AMD-IR-3
    name:   AMD-IR-3
     size:   0
     mapped: 28
     flags:  0x00000003
     parent: VECTOR
        name:   VECTOR
         size:   0
         mapped: 295
         flags:  0x00000003

You will find something like this on your thinkpad as well. It might be
a two level hierarchy if there is no remapping unit.

name:   IO-APIC-240
 size:   24
 mapped: 15
 flags:  0x00000003
 parent: VECTOR
    name:   VECTOR
     size:   0
     mapped: 292
     flags:  0x00000003

and if you look at an interrupt:

# cat /sys/kernel/debug/irq/irqs/4
handler:  handle_edge_irq
device:   (null)
status:   0x00004000
istate:   0x00000000
ddepth:   0
wdepth:   0
dstate:   0x35408200
            IRQD_ACTIVATED
            IRQD_IRQ_STARTED
            IRQD_SINGLE_TARGET
            IRQD_MOVE_PCNTXT
            IRQD_AFFINITY_ON_ACTIVATE
            IRQD_CAN_RESERVE
            IRQD_HANDLE_ENFORCE_IRQCTX
node:     0
affinity: 0-63,128-191
effectiv: 130
pending:  
domain:  IO-APIC-240
 hwirq:   0x4
 chip:    IR-IO-APIC
  flags:   0x10
             IRQCHIP_SKIP_SET_WAKE
 parent:
    domain:  AMD-IR-3
     hwirq:   0xa00000
     chip:    AMD-IR
      flags:   0x0
     parent:
        domain:  VECTOR
         hwirq:   0x4
         chip:    APIC
          flags:   0x0
         Vector:    33
         Target:   130
         move_in_progress: 0
         is_managed:       0
         can_reserve:      1
         has_reserved:     0
         cleanup_pending:  0

you can see the domain hierarchy as well and the relevant information
per domain. So on the IO-APIC it's hwirq 4, i.e. PIN 4. In the remap
domain it's 0xa00000 which is the relevant table IIRC and the vector
domain has extra information about the target vector (33) and the target
CPU (130).

> So I'll ask again :)
>
> What is the easiest way to get irq-hierarchical hardware?
> I have a bunch of powerpc boxes (no good) but also a raspberry pi,
> a bunch of 32/64bit orange pi's, an "armada" arm box,
> thinkpads - is any of this good for the task?

You have it already. Everything you listed except PPC uses hierarchical
interrupt domains.

> +static void delayed_free_desc(struct rcu_head *rhp);
>  static void irq_kobj_release(struct kobject *kobj)
>  {
>  	struct irq_desc *desc = container_of(kobj, struct irq_desc, kobj);
> +#ifdef CONFIG_IRQ_DOMAIN
> +	struct irq_domain *domain;
> +	unsigned int virq = desc->irq_data.irq;
>  
> -	free_masks(desc);
> -	free_percpu(desc->kstat_irqs);
> -	kfree(desc);
> +	domain = desc->irq_data.domain;
> +	if (domain) {
> +		if (irq_domain_is_hierarchy(domain)) {
> +			irq_domain_free_irqs(virq, 1);
> +		} else {
> +			irq_domain_disassociate(domain, virq);
> +			irq_free_desc(virq);
> +		}
> +	}
> +#endif

This is really a layering violation. While it's smart to reuse the kobj
inside the irq descriptor, you're bolting the refcounting which is
required for handling this irqdomain multi-association case into the irq
descriptor code and invoking stuff from within the irq descriptor code
which absolutely does not belong there at all.

Thereby breaking hierarchical irqdomains completely. Just look at
the following callchain as one example (there are way more):

pci_disable_msi()
  pci_msi_teardown_msi_irqs()
    msi_domain_free_irqs()
      msi_domain->ops->domain_free_irqs()
        __msi_domain_free_irqs()
           irq_domain_free_irqs()
             irq_domain_free_irqs_hierarchy()
               irq_free_descs()
                 free_descs()

Sorry, but it's really not rocket science to find the call chains leading
up to free_desc().

And once you found all of them you'll end up with lots of ugly
constructs like conditional refcounting which is wrong to begin with:

> +		if (get_ref) {
> +			struct irq_desc *desc = irq_to_desc(virq + i);

This is an alarm sign in the first place.

I'm not saying, that reusing the irqdesc::kobj is bad per se, but this
needs way more thought than just moving stuff into the release function
and slapping kobj_get()/put() pairs all over the place.

If at all this needs to be done in small incremental steps and not as a
wholesale rewrite which is pretty much impossible to review.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ