lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20170406.123208.1810854461605308607.davem@davemloft.net>
Date:   Thu, 06 Apr 2017 12:32:08 -0700 (PDT)
From:   David Miller <davem@...emloft.net>
To:     felix.manlunas@...ium.com
Cc:     netdev@...r.kernel.org, raghu.vatsavayi@...ium.com,
        derek.chickles@...ium.com
Subject: Re: [PATCH net-next] liquidio: fix Octeon core watchdog timeout
 false alarm

From: Felix Manlunas <felix.manlunas@...ium.com>
Date: Tue, 4 Apr 2017 19:26:57 -0700

> Detection of watchdog timeout of Octeon cores is flawed and susceptible to
> false alarms.  Refactor by removing the detection code, and in its place,
> leverage existing code that monitors for an indication from the NIC
> firmware that an Octeon core crashed; expand the meaning of the indication
> to "an Octeon core crashed or its watchdog timer expired".  Detection of
> watchdog timeout is now delegated to an exception handler in the NIC
> firmware; this is free of false alarms.
> 
> Also if there's an Octeon core crash or watchdog timeout:
> (1) Disable VF Ethernet links.
> (2) Decrement the module refcount by an amount equal to the number of
>     active VFs of the NIC whose Octeon core crashed or had a watchdog
>     timeout.  The refcount will continue to reflect the active VFs of
>     other liquidio NIC(s) (if present) whose Octeon cores are faultless.
> 
> Item (2) is needed to avoid the case of not being able to unload the driver
> because the module refcount is stuck at some non-zero number.  There is
> code that, in normal cases, decrements the refcount upon receiving a
> message from the firmware that a VF driver was unloaded.  But in
> exceptional cases like an Octeon core crash or watchdog timeout, arrival of
> that particular message from the firmware might be unreliable.  That normal
> case code is changed to not touch the refcount in the exceptional case to
> avoid contention (over the refcount) with the liquidio_watchdog kernel
> thread who will carry out item (2).
> 
> Signed-off-by: Felix Manlunas <felix.manlunas@...ium.com>
> Signed-off-by: Derek Chickles <derek.chickles@...ium.com>

Applied, thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ