[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20170406.123208.1810854461605308607.davem@davemloft.net>
Date: Thu, 06 Apr 2017 12:32:08 -0700 (PDT)
From: David Miller <davem@...emloft.net>
To: felix.manlunas@...ium.com
Cc: netdev@...r.kernel.org, raghu.vatsavayi@...ium.com,
derek.chickles@...ium.com
Subject: Re: [PATCH net-next] liquidio: fix Octeon core watchdog timeout
false alarm
From: Felix Manlunas <felix.manlunas@...ium.com>
Date: Tue, 4 Apr 2017 19:26:57 -0700
> Detection of watchdog timeout of Octeon cores is flawed and susceptible to
> false alarms. Refactor by removing the detection code, and in its place,
> leverage existing code that monitors for an indication from the NIC
> firmware that an Octeon core crashed; expand the meaning of the indication
> to "an Octeon core crashed or its watchdog timer expired". Detection of
> watchdog timeout is now delegated to an exception handler in the NIC
> firmware; this is free of false alarms.
>
> Also if there's an Octeon core crash or watchdog timeout:
> (1) Disable VF Ethernet links.
> (2) Decrement the module refcount by an amount equal to the number of
> active VFs of the NIC whose Octeon core crashed or had a watchdog
> timeout. The refcount will continue to reflect the active VFs of
> other liquidio NIC(s) (if present) whose Octeon cores are faultless.
>
> Item (2) is needed to avoid the case of not being able to unload the driver
> because the module refcount is stuck at some non-zero number. There is
> code that, in normal cases, decrements the refcount upon receiving a
> message from the firmware that a VF driver was unloaded. But in
> exceptional cases like an Octeon core crash or watchdog timeout, arrival of
> that particular message from the firmware might be unreliable. That normal
> case code is changed to not touch the refcount in the exceptional case to
> avoid contention (over the refcount) with the liquidio_watchdog kernel
> thread who will carry out item (2).
>
> Signed-off-by: Felix Manlunas <felix.manlunas@...ium.com>
> Signed-off-by: Derek Chickles <derek.chickles@...ium.com>
Applied, thanks.
Powered by blists - more mailing lists