lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLGKbeeBNoDQU9C7nPRCxc6FUsrwn0LfrAKrJiJ14PH+w@mail.gmail.com>
Date:   Wed, 22 Jun 2022 12:36:23 +0200
From:   Eric Dumazet <edumazet@...gle.com>
To:     Florian Westphal <fw@...len.de>
Cc:     Ilya Maximets <i.maximets@....org>,
        netdev <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>, dev@...nvswitch.org,
        LKML <linux-kernel@...r.kernel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net] net: ensure all external references are released in
 deferred skbuffs

On Wed, Jun 22, 2022 at 12:28 PM Florian Westphal <fw@...len.de> wrote:
>
> Eric Dumazet <edumazet@...gle.com> wrote:
> > On Sun, Jun 19, 2022 at 2:39 AM Ilya Maximets <i.maximets@....org> wrote:
> > >
> > > Open vSwitch system test suite is broken due to inability to
> > > load/unload netfilter modules.  kworker thread is getting trapped
> > > in the infinite loop while running a net cleanup inside the
> > > nf_conntrack_cleanup_net_list, because deferred skbuffs are still
> > > holding nfct references and not being freed by their CPU cores.
> > >
> > > In general, the idea that we will have an rx interrupt on every
> > > CPU core at some point in a near future doesn't seem correct.
> > > Devices are getting created and destroyed, interrupts are getting
> > > re-scheduled, CPUs are going online and offline dynamically.
> > > Any of these events may leave packets stuck in defer list for a
> > > long time.  It might be OK, if they are just a piece of memory,
> > > but we can't afford them holding references to any other resources.
> > >
> > > In case of OVS, nfct reference keeps the kernel thread in busy loop
> > > while holding a 'pernet_ops_rwsem' semaphore.  That blocks the
> > > later modprobe request from user space:
> > >
> > >   # ps
> > >    299 root  R  99.3  200:25.89 kworker/u96:4+
> > >
> > >   # journalctl
> > >   INFO: task modprobe:11787 blocked for more than 1228 seconds.
> > >         Not tainted 5.19.0-rc2 #8
> > >   task:modprobe     state:D
> > >   Call Trace:
> > >    <TASK>
> > >    __schedule+0x8aa/0x21d0
> > >    schedule+0xcc/0x200
> > >    rwsem_down_write_slowpath+0x8e4/0x1580
> > >    down_write+0xfc/0x140
> > >    register_pernet_subsys+0x15/0x40
> > >    nf_nat_init+0xb6/0x1000 [nf_nat]
> > >    do_one_initcall+0xbb/0x410
> > >    do_init_module+0x1b4/0x640
> > >    load_module+0x4c1b/0x58d0
> > >    __do_sys_init_module+0x1d7/0x220
> > >    do_syscall_64+0x3a/0x80
> > >    entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > >
> > > At this point OVS testsuite is unresponsive and never recover,
> > > because these skbuffs are never freed.
> > >
> > > Solution is to make sure no external references attached to skb
> > > before pushing it to the defer list.  Using skb_release_head_state()
> > > for that purpose.  The function modified to be re-enterable, as it
> > > will be called again during the defer list flush.
> > >
> > > Another approach that can fix the OVS use-case, is to kick all
> > > cores while waiting for references to be released during the net
> > > cleanup.  But that sounds more like a workaround for a current
> > > issue rather than a proper solution and will not cover possible
> > > issues in other parts of the code.
> > >
> > > Additionally checking for skb_zcopy() while deferring.  This might
> > > not be necessary, as I'm not sure if we can actually have zero copy
> > > packets on this path, but seems worth having for completeness as we
> > > should never defer such packets regardless.
> > >
> > > CC: Eric Dumazet <edumazet@...gle.com>
> > > Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
> > > Signed-off-by: Ilya Maximets <i.maximets@....org>
> > > ---
> > >  net/core/skbuff.c | 16 +++++++++++-----
> > >  1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > I do not think this patch is doing the right thing.
> >
> > Packets sitting in TCP receive queues should not hold state that is
> > not relevant for TCP recvmsg().
>
> Agree, but tcp_v4/6_rcv() already call nf_reset_ct(), else it would
> not be possible to remove nf_conntrack module in practice.

Well, existing nf_reset_ct() does not catch all cases, like TCP fastopen ?

Maybe 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
only widened the problem.

>
> I wonder where the deferred skbs are coming from, any and all
> queued skbs need the conntrack state dropped.
>
> I don't mind a new helper that does a combined dst+ct release though.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ