lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210518123317.38f6657f@kicinski-fedora-PC1C0HJN>
Date:   Tue, 18 May 2021 12:33:17 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Saeed Mahameed <saeed@...nel.org>
Cc:     eric.dumazet@...il.com, netdev@...r.kernel.org
Subject: Re: [PATCH net] mlx5e: add add missing BH locking around
 napi_schdule()

On Tue, 18 May 2021 12:23:54 -0700 Saeed Mahameed wrote:
> On Wed, 2021-05-05 at 13:20 -0700, Jakub Kicinski wrote:
> > It's not correct to call napi_schedule() in pure process
> > context. Because we use __raise_softirq_irqoff() we require
> > callers to be in a context which will eventually lead to
> > softirq handling (hardirq, bh disabled, etc.).
> > 
> > With code as is users will see:
> > 
> >  NOHZ tick-stop error: Non-RCU local softirq work is pending, handler
> > #08!!!
> > 
> > Fixes: a8dd7ac12fc3 ("net/mlx5e: Generalize RQ activation")
> > Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> > ---
> > We may want to patch net-next once it opens to switch
> > from __raise_softirq_irqoff() to raise_softirq_irqoff().
> > The irq_count() check is probably negligable and we'd need
> > to split the hardirq / non-hardirq paths completely to
> > keep the current behaviour. Plus what's hardirq is murky
> > with RT enabled..
> > 
> > Eric WDYT?
> 
> I was waiting for Eric to reply, Anyway i think this patch is correct
> as is, 
> 
> Jakub do you want me to submit to net  via net-mlx5 branch? 

Yes, please. FWIW we had a short exchange with RT folks last Friday,
and it doesn't seem like RT is an issue, so tglx will likely take
care of just adding a lockdep check and maybe a helper for scheduling
from process ctx.

> Another valid solution is that driver will avoid calling
> napi_schedule() altogether from  process context,  we have the
> mechanism of mlx5e_trigger_irq(), which can be utilized here, but needs
> some re-factoring to move the icosq object from the main rx rq to the
> containing channel object.

Yea.. someone on your side would probably need to take care of that
kind of surgery. Apart from that no preference on which fix gets
applied.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ