lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 14 Oct 2022 14:43:47 +0200
From:   Íñigo Huguet <ihuguet@...hat.com>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     irusskikh@...vell.com, dbogdanov@...vell.com, davem@...emloft.net,
        edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
        netdev@...r.kernel.org, Li Liang <liali@...hat.com>
Subject: Re: [PATCH net] atlantic: fix deadlock at aq_nic_stop

On Fri, Oct 14, 2022 at 2:14 PM Andrew Lunn <andrew@...n.ch> wrote:
>
> > Fix trying to acquire rtnl_lock at the beginning of those functions, and
> > returning if NIC closing is ongoing. Also do the "linkstate" stuff in a
> > workqueue instead than in a threaded irq, where sleeping or waiting a
> > mutex for a long time is discouraged.
>
> What happens when the same interrupt fires again, while the work queue
> is still active? The advantage of the threaded interrupt handler is
> that the interrupt will be kept disabled, and should not fire again
> until the threaded interrupt handler exits.

Nothing happens, if it's already queued, it won't be queued again, and
when it runs it will evaluate the last link state. And in the worst
case, it will be enqueued to run again, and if linkstate has changed
it will be evaluated again. This will rarely happen and it's harmless.

Also, I haven't checked it but these lines suggest that the IRQ is
auto-disabled in the hw until you enable it again. I didn't rely on
this, anyway.
        self->aq_hw_ops->hw_irq_enable(self->aq_hw,
                                       BIT(self->aq_nic_cfg.link_irq_vec));

Honestly I was a bit in doubt on doing this, with the threaded irq it
would also work. I'd like to hear more opinions about this and I can
change it back.

>
> > +static void aq_nic_linkstate_task(struct work_struct *work)
> > +{
> > +     struct aq_nic_s *self = container_of(work, struct aq_nic_s,
> > +                                          linkstate_task);
> > +
> > +#if IS_ENABLED(CONFIG_MACSEC)
> > +     /* avoid deadlock at aq_nic_stop -> cancel_work_sync */
> > +     while (!rtnl_trylock()) {
> > +             if (aq_utils_obj_test(&self->flags, AQ_NIC_FLAG_CLOSING))
> > +                     return;
> > +             msleep(AQ_TASK_RETRY_MS);
> > +     }
> > +#endif
> > +
> >       aq_nic_update_link_status(self);
> >
> > +#if IS_ENABLED(CONFIG_MACSEC)
> > +     rtnl_unlock();
> > +#endif
> > +
>
> If MACSEC is enabled, aq_nic_update_link_status() is called with RTNL
> held. If it is not enabled, RTNL is not held. This sort of
> inconsistency could lead to further locking bugs, since it is not
> obvious. Please try to make this consistent.

This is not new in these patches, that's what was already happening, I
just moved it to get the lock a bit earlier. In my opinion, this is as
it should be: why acquire a mutex if you don't have anything to
protect with it? And it's worse with rtnl_lock which is held by many
processes, and can be held for quite long times...

>
>          Andrew
>


-- 
Íñigo Huguet

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ