netdev - Re: [PATCH net-next] net: try to avoid unneeded backlog flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <39a00c2183ef5bc349ce1195fdac53be2e26af6d.camel@redhat.com>
Date:   Thu, 10 Sep 2020 18:14:04 +0200
From:   Paolo Abeni <pabeni@...hat.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     netdev <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Marcelo Tosatti <mtosatti@...hat.com>
Subject: Re: [PATCH net-next] net: try to avoid unneeded backlog flush

Hi,

On Thu, 2020-09-10 at 16:36 +0200, Eric Dumazet wrote:
> On Thu, Sep 10, 2020 at 4:21 PM Paolo Abeni <pabeni@...hat.com> wrote:
> > flush_all_backlogs() may cause deadlock on systems
> > running processes with FIFO scheduling policy.
> > 
> > The above is critical in -RT scenarios, where user-space
> > specifically ensure no network activity is scheduled on
> > the CPU running the mentioned FIFO process, but still get
> > stuck.
> > 
> > This commit tries to address the problem checking the
> > backlog status on the remote CPUs before scheduling the
> > flush operation. If the backlog is empty, we can skip it.
> 
> If it is not empty, the problem you want to fix is still there ?

Thank you for the very prompt feedback!

Yes, if the backlog is not empty (meaning the user space as failed to
configure the kernel correctly), device removal could still hang. 

The problem currently seen by the RT team is that they really ensure no
network activity is scheduled on the relevant CPU, but still see the
hang.

I think to to expose with a follow-up patch the backlog len for each
CPU - read-only via net-procfs - to the user-space.

> > Signed-off-by: Paolo Abeni <pabeni@...hat.com>
> > ---
> >  net/core/dev.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 46 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 152ad3b578de..fdef40bf4b88 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -5621,17 +5621,59 @@ static void flush_backlog(struct work_struct *work)
> >         local_bh_enable();
> >  }
> > 
> > +static bool flush_required(int cpu)
> > +{
> > +#if IS_ENABLED(CONFIG_RPS)
> > +       struct softnet_data *sd = &per_cpu(softnet_data, cpu);
> > +       bool do_flush;
> > +
> > +       local_irq_disable();
> > +       rps_lock(sd);
> > +
> > +       /* as insertion into process_queue happens with the rps lock held,
> > +        * process_queue access may race only with dequeue
> > +        */
> > +       do_flush = !skb_queue_empty(&sd->input_pkt_queue) ||
> > +                  !skb_queue_empty_lockless(&sd->process_queue);
> > +       rps_unlock(sd);
> > +       local_irq_enable();
> > +
> > +       return do_flush;
> > +#endif
> > +       /* without RPS we can't safely check input_pkt_queue: during a
> > +        * concurrent remote skb_queue_splice() we can detect as empty both
> > +        * input_pkt_queue and process_queue even if the latter could end-up
> > +        * containing a lot of packets.
> > +        */
> > +       return true;
> > +}
> > +
> >  static void flush_all_backlogs(void)
> >  {
> > +       static cpumask_t flush_cpus  = { CPU_BITS_NONE };
> >         unsigned int cpu;
> > 
> > +       /* since we are under rtnl lock protection we can use static data
> > +        * for the cpumask and avoid allocating on stack the possibly
> > +        * large mask
> > +        */
> > +       ASSERT_RTNL();
> > +
> 
> OK, but you only set bits in this bitmask.
> 
> You probably want to clear it here, not rely on one time CPU_BITS_NONE

Thanks! will do in v2. I'll also need to improve my local tests, as the
above should not have escaped them.

Cheers,

Paolo