lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 Feb 2019 14:20:45 -0500
From:   John David Anglin <dave.anglin@...l.net>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     Russell King <linux@....linux.org.uk>,
        Vivien Didelot <vivien.didelot@...oirfairelinux.com>,
        Florian Fainelli <f.fainelli@...il.com>, netdev@...r.kernel.org
Subject: Re: [PATCH v2] net: dsa: mv88e6xxx: Revise irq setup ordering

On 2019-02-04 9:21 p.m., Andrew Lunn wrote:
>> The problem is INTn can go low before the interrupt handler for it is
>> registered and enabled.
>> This can't happen.  The domain is setup immediately after registering
>> the GPIO interrupt.
>> The interrupt can't fire until one of the enables is set.
> These two statement seem to contradict each other?
I don't think so.  In the first, I am referring to the enabling of the
GPIO pin interrupt in the
Armada 3720 CPU.  Specifically, this would be the setting of the enable
in the South Bridge
GPIO2 Interrupt Enable Register (RD0018C00).  In the second, I'm
referring to the enables
in the switch's Global Control Register.  Setting these enables to zero
forces the the switch's
INTn output high (assuming there isn't a short).  This shouldn't cause a
CPU interrupt if the
South Bridge GPIO2 Polarity Control is set correctly at the time.  INTn
goes low after a
switch reset because EEIntEn is set on reset, so clearing the enables
will causes a rising
edge on INTn.
>
>> These are set
>> by mv88e6xxx_g2_irq_setup(),
>> mv88e6xxx_g1_atu_prob_irq_setup() and
>> mv88e6xxx_g1_vtu_prob_irq_setup().  These irqs
>> are setup after mv88e6xxx_g1_irq_setup()/mv88e6xxx_irq_poll_setup() is
>> called.  Thus, the
>> irq domain is setup before the GPIO interrupt can fire.
> At what point is INTn going low, causing you all these problems? I've
> yet to see a real description of the race. Please give us a blow by
> blow of how the race happens. And then explain how your fix actually
> fixes this.
I'm going to withdraw my proposed change.  I had thought that calling
request_threaded_irq()
earlier fixed the issue at boot.  But given your mail, I reverted the
change in my 4.20.6 build
and I wasn't able to reproduce the problem.  Yet, my 4.20.4 build fails
every time doing a power
on boot.

We have the following interrupt status when it fails:

 44:          2          0     GPIO2  23 Edge      d0032004.mdio-mii:01
 48:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
 50:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
 52:          0          1  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
 55:          0          1  mv88e6xxx-g2   1 Edge     
!soc!internal-regs@...00000!mdio@...04!switch0@...dio:11
 56:          0          0  mv88e6xxx-g2   2 Edge     
!soc!internal-regs@...00000!mdio@...04!switch0@...dio:12
 57:          0          0  mv88e6xxx-g2   3 Edge     
!soc!internal-regs@...00000!mdio@...04!switch0@...dio:13
 69:          0          0  mv88e6xxx-g2  15 Edge      mv88e6xxx-watchdog

We have the following values in the Global1 registers:

MCLI> sys dumpGlobal1

------------------------------------------------------
Switch Global Status(0x0)                       c881
ATU FID Register(0x1)                           0000
VTU FID Register(0x2)                           0000
VTU SID Register(0x3)                           0000
Switch Global Control Register(0x4)             40a8

The DeviceInt bit is set in the Global Status register.  It would appear
we have missed an edge on GPIO2 23.
Yet, we have handled 2 interrupts on it.  So, this isn't the setup issue
that I thought it was.

>
> Also, i'm not yet convinced this hardware can actually work correctly
> with edge interrupts. You can probably reduce the size of the race
> window, but i don't think you can eliminate it. And if you cannot
> eliminate it, at some point it is going to hit you.
You could be right but I don't want to give up just yet.  I need to go
back and rebuild v4.20.4 and retest.
My hunch is the second hunk of the original patch will fix this.

Dave

-- 
John David Anglin  dave.anglin@...l.net

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ