lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 26 Mar 2020 09:27:42 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Kai-Heng Feng <kai.heng.feng@...onical.com>
Cc:     Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
        "open list:NETWORKING DRIVERS" <netdev@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        "moderated list:INTEL ETHERNET DRIVERS" 
        <intel-wired-lan@...ts.osuosl.org>,
        "David S. Miller" <davem@...emloft.net>
Subject: Re: [Intel-wired-lan] [PATCH] igb: Use a sperate mutex insead of rtnl_lock()

On Thu, Mar 26, 2020 at 3:39 AM Kai-Heng Feng
<kai.heng.feng@...onical.com> wrote:
>
> Commit 9474933caf21 ("igb: close/suspend race in netif_device_detach")
> fixed race condition between close and power management ops by using
> rtnl_lock().
>
> This fix is a preparation for next patch, to prevent a dead lock under
> rtnl_lock() when calling runtime resume routine.
>
> However, we can't use device_lock() in igb_close() because when module
> is getting removed, the lock is already held for igb_remove(), and
> igb_close() gets called during unregistering the netdev, hence causing a
> deadlock. So let's introduce a new mutex so we don't cause a deadlock
> with driver core or netdev core.
>
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@...onical.com>

So this description doesn't make much sense to me. You describe the
use of the device_lock() in igb_close() but it isn't used there. In
addition it seems like you are arbitrarily moving code that was
wrapped in the rtnl_lock out of it. I'm not entirely sure that is safe
since there are calls within many of these functions that assume the
rtnl_lock is held and changing that is likely going to introduce more
issues.



> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index b46bff8fe056..dc7ed5dd216b 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -288,6 +288,8 @@ static const struct igb_reg_info igb_reg_info_tbl[] = {
>         {}
>  };
>
> +static DEFINE_MUTEX(igb_mutex);
> +
>  /* igb_regdump - register printout routine */
>  static void igb_regdump(struct e1000_hw *hw, struct igb_reg_info *reginfo)
>  {
> @@ -4026,9 +4028,14 @@ static int __igb_close(struct net_device *netdev, bool suspending)
>
>  int igb_close(struct net_device *netdev)
>  {
> +       int err = 0;
> +
> +       mutex_lock(&igb_mutex);
>         if (netif_device_present(netdev) || netdev->dismantle)
> -               return __igb_close(netdev, false);
> -       return 0;
> +               err = __igb_close(netdev, false);
> +       mutex_unlock(&igb_mutex);
> +
> +       return err;
>  }
>

Okay, so I am guessing the problem has something to do with the
addition of the netdev->dismantle test here and the fact that it is
bypassing the present check for the hotplug remove case?

So it looks like nobody ever really reviewed commit 888f22931478
("igb: Free IRQs when device is hotplugged"). What I would recommend
is reverting it and instead we fix the remaining pieces that need to
be addressed in igb so it more closely matches what we have in e1000e
after commit a7023819404a ("e1000e: Use rtnl_lock to prevent race
conditions between net and pci/pm"). From what I can tell the only
pieces that are really missing is to update igb_io_error_detected so
that in addition to igb_down it will call igb_free_irq, and then in
addition we should be wrapping most of the code in that function with
an rtnl_lock since it is detaching a device and making modifications
to it.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ