linux-kernel - Issues with irq handling in mfd/intel_soc_pmic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <57d040bf-8032-d60f-2abd-fd9bb95abd71@redhat.com>
Date:   Mon, 27 Feb 2017 13:38:12 +0100
From:   Hans de Goede <hdegoede@...hat.com>
To:     Nilesh Bacchewar <nilesh.bacchewar@...el.com>,
        Bin Gao <bin.gao@...ux.intel.com>
Cc:     Lee Jones <lee.jones@...aro.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Issues with irq handling in mfd/intel_soc_pmic_bxtwc.c

Hi All,

While working on Whiskey Cove PMIC support for Cherrytrail devices
I noticed that the irq handling in the Broxton Whiskey Cove MFD code
seems to be wrong.

There seems to be some misunderstanding here about how the irqchip
stuff should work.

If I understand the hardware correctly, there are 2 levels
of interrupt masks first there is BXTWC_MIRQLVL1 which
creates a number of interrupt groups, like thermal, bcu, etc.

Then per interrupt group there is not only 1 or more
write 1 to clear status registers to indicate the exact
reason of the interrupt but also to possibility to mask
out certain status bits from contributing to raising the
interrupt for that group if not masked in BXTWC_MIRQLVL1.

The current code defines 2 separate interrupt chips
for this (well 3 really, but the 3th one is not relevant
for this discussion).

But instead of chaining these interrupt chips, as they
are in hardware, they are setup as separate chips sharing the
same interrupt.

And all interrupt listeners (callers of request_interrupt
on the irq-chip registered interrupts) only listen to
the 2nd level irq-chip irqs.

Since no one has requested any interrupts on the 1st level
irq chip, this will cause ALL interrupt groups to get masked
there by the interrupt core (unused interrupts are always
masked), causing the 2nd level interrupts to never trigger.

Currently this is being worked around for charging interrupts
by the last hunk of this commit:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/mfd/intel_soc_pmic_bxtwc.c?id=9c6235c8633210cc2da0882e2e9d6ff90aa37503

Which blames the hardware for masking the lvl1 interrupt, but
this really is expected behavior of the interrupt chip since
no-one has requested the BXTWC_CHGR_LVL1_IRQ.

I hope the above properly explains what I believe is wrong
here. If not please ask.

###

Now on to a solution, if you look at almost all the consumers
then they are actually interested in all interrupts in a
group, so it seems that there really is no need for the
bxtwc_regmap_irq_chip_level2

Taking the thermal irq as example we could simple replace:

static struct resource thermal_resources[] = {
         DEFINE_RES_IRQ(BXTWC_THRM0_IRQ),
         DEFINE_RES_IRQ(BXTWC_THRM1_IRQ),
         DEFINE_RES_IRQ(BXTWC_THRM2_IRQ),
};

With:

static struct resource thermal_resources[] = {
         DEFINE_RES_IRQ(BXTWC_THRM_LVL1_IRQ),
};

And make the thermal driver responsible for masking
any status bits it is not interested in and write-1-clearing
BXTWC_THRM0IRQ - BXTWC_THRM2IRQ in its interrupt handler,
where it needs to read them anyway to find out the
exact cause(s) of the interrupt.

Note the driver needs to do a read, check set bits
(and act upon them) and then write back the read value
for these registers. It is important to write back the
read value and not e.g. 0xff to makes sure that no interrupts
are missed due to races, e.g. blindy writing 0xff to ack
may cause some status bits which get set while the interrupt
handler runs to get missed by the driver.

This is how this is done in devices similar like this in
general and I believe that the same should be done for the
bxtwc code.

One special case here seems to be the CHGR interrupt
which gets used by 2 sub-devices. The standard solution
for this would be to simple have both drivers request
the same level1 irq with the SHARED flag, and have them
both only check + write-1-clear the bits they are
interested in. This does mean that both drivers
must be loaded (or none) to avoid an interrupt storm.

Alternatively you could register a 2nd level interrupt
chip chained to the level1 chip (so listening to the
BXTWC_CHGR_LVL1_IRQ not to pmic->irq) and have that
split out things into separate irqs, that avoids the
need to have both drivers successfully loaded to
avoid interrupt storms. But this needs to be chained
to the lvl1 chgr irq and not directly listen to
pmic->irq.

Another option would be to keep the current design and
add a 2nd level irqchip per 1st level irq listening to
the 1st level irq, but that would require 8 additional
irq chips and for all consumers except the chgr consumers
there is no added value in doing that.

Regards,

Hans



p.s.

There also is a bug in the code adding the TMU irqchip:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/mfd/intel_soc_pmic_bxtwc.c?id=957ae5098185e763b5c06be6c3b4b6e98c048712

Since this is a new irqchip it should use a new irq enum
and not add its irqs to the bxtwc_irqs_level2 enum.

Note that irqchip can be removed entirely instead
directly listening to the lvl1 TMU irq, which would
also fix this.