netdev - Re: [PATCH 1/7] 8139cp: Improve accuracy of cp_interrupt() return, to survive IRQ storms

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20150922.164522.1769837358584981735.davem@davemloft.net>
Date:	Tue, 22 Sep 2015 16:45:22 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	dwmw2@...radead.org
Cc:	netdev@...r.kernel.org, romieu@...zoreil.com
Subject: Re: [PATCH 1/7] 8139cp: Improve accuracy of cp_interrupt() return,
 to survive IRQ storms

From: David Woodhouse <dwmw2@...radead.org>
Date: Mon, 21 Sep 2015 15:01:49 +0100

> From: David Woodhouse <David.Woodhouse@...el.com>
> 
> The TX timeout handling has been observed to trigger RX IRQ storms. And
> since cp_interrupt() just keeps saying that it handled the interrupt,
> the machine then dies. Fix the return value from cp_interrupt(), and
> the offending IRQ gets disabled and the machine survives.
> 
> Signed-off-by: David Woodhouse <David.Woodhouse@...el.com>

Like Francois, I don't like this.

First of all, there are only 3 bits not handled explicitly by
cp_interrupt().  And for those if they are set and no other condition
was rasied, you should report the event and the status bits set, and
then forcibly clear the interrupt.

And if we are getting Rx* interrupts with napi_schedule_prep()
returning false, that's a serious problem.  It can mean that the TX
timeout handler's resetting of the chip is either miscoded or is
racing with either NAPI polling or this interrupt handler.

And if that's the case your patch is making the chip's IRQ line get
disabled when this race triggers.

This change is even worse, in my opinion, if patch #2 indeed makes
the problem go away.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html