netdev - Re: [PATCH net] ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGx-UoA804Kbg0-Bwvzstu171zHcx1Wm8J8m+Yz8z7k_SMpm4g@mail.gmail.com>
Date:   Wed, 5 Dec 2018 09:13:22 -0800
From:   Peter Oskolkov <posk.devel@...il.com>
To:     jwiesner@...e.com
Cc:     netdev@...r.kernel.org, yoshfuji@...ux-ipv6.org, mkubecek@...e.com
Subject: Re: [PATCH net] ipv4: ipv6: netfilter: Adjust the frag mem limit when
 truesize changes

On Wed, Dec 5, 2018 at 7:57 AM Jiri Wiesner <jwiesner@...e.com> wrote:
>
> The *_frag_reasm() functions are susceptible to miscalculating the byte
> count of packet fragments in case the truesize of a head buffer changes.
> The truesize member may be changed by the call to skb_unclone(), leaving
> the fragment memory limit counter unbalanced even if all fragments are
> processed. This miscalculation goes unnoticed as long as the network
> namespace which holds the counter is not destroyed.
>
> Should an attempt be made to destroy a network namespace that holds an
> unbalanced fragment memory limit counter the cleanup of the namespace
> never finishes. The thread handling the cleanup gets stuck in
> inet_frags_exit_net() waiting for the percpu counter to reach zero. The
> thread is usually in running state with a stacktrace similar to:
>
>  PID: 1073   TASK: ffff880626711440  CPU: 1   COMMAND: "kworker/u48:4"
>   #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480
>   #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b
>   #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c
>   #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856
>   #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0
>  #10 [ffff880621563e38] process_one_work at ffffffff81096f14
>
> It is not possible to create new network namespaces, and processes
> that call unshare() end up being stuck in uninterruptible sleep state
> waiting to acquire the net_mutex.
>
> The bug was observed in the IPv6 netfilter code by Per Sundstrom.
> I thank him for his analysis of the problem. The parts of this patch
> that apply to IPv4 and IPv6 fragment reassembly are preemptive measures.
>
> Signed-off-by: Jiri Wiesner <jwiesner@...e.com>
> Reported-by: Per Sundstrom <per.sundstrom@...qube.se>
> ---

Acked-by: Peter Oskolkov <posk@...gle.com>

>  net/ipv4/ip_fragment.c                  | 7 +++++++
>  net/ipv6/netfilter/nf_conntrack_reasm.c | 8 +++++++-
>  net/ipv6/reassembly.c                   | 8 +++++++-
>  3 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index d6ee343fdb86..aa0b22697998 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -515,6 +515,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
>         struct rb_node *rbn;
>         int len;
>         int ihlen;
> +       int delta;
>         int err;
>         u8 ecn;
>
> @@ -556,10 +557,16 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
>         if (len > 65535)
>                 goto out_oversize;
>
> +       delta = - head->truesize;
> +
>         /* Head of list must not be cloned. */
>         if (skb_unclone(head, GFP_ATOMIC))
>                 goto out_nomem;
>
> +       delta += head->truesize;
> +       if (delta)
> +               add_frag_mem_limit(qp->q.net, delta);
> +
>         /* If the first fragment is fragmented itself, we split
>          * it to two chunks: the first with data and paged part
>          * and the second, holding only fragments. */
> diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
> index d219979c3e52..181da2c40f9a 100644
> --- a/net/ipv6/netfilter/nf_conntrack_reasm.c
> +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
> @@ -341,7 +341,7 @@ static bool
>  nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev,  struct net_device *dev)
>  {
>         struct sk_buff *fp, *head = fq->q.fragments;
> -       int    payload_len;
> +       int    payload_len, delta;
>         u8 ecn;
>
>         inet_frag_kill(&fq->q);
> @@ -363,10 +363,16 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev,  struct net_devic
>                 return false;
>         }
>
> +       delta = - head->truesize;
> +
>         /* Head of list must not be cloned. */
>         if (skb_unclone(head, GFP_ATOMIC))
>                 return false;
>
> +       delta += head->truesize;
> +       if (delta)
> +               add_frag_mem_limit(fq->q.net, delta);
> +
>         /* If the first fragment is fragmented itself, we split
>          * it to two chunks: the first with data and paged part
>          * and the second, holding only fragments. */
> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
> index 5c3c92713096..aa26c45486d9 100644
> --- a/net/ipv6/reassembly.c
> +++ b/net/ipv6/reassembly.c
> @@ -281,7 +281,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
>  {
>         struct net *net = container_of(fq->q.net, struct net, ipv6.frags);
>         struct sk_buff *fp, *head = fq->q.fragments;
> -       int    payload_len;
> +       int    payload_len, delta;
>         unsigned int nhoff;
>         int sum_truesize;
>         u8 ecn;
> @@ -322,10 +322,16 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
>         if (payload_len > IPV6_MAXPLEN)
>                 goto out_oversize;
>
> +       delta = - head->truesize;
> +
>         /* Head of list must not be cloned. */
>         if (skb_unclone(head, GFP_ATOMIC))
>                 goto out_oom;
>
> +       delta += head->truesize;
> +       if (delta)
> +               add_frag_mem_limit(fq->q.net, delta);
> +
>         /* If the first fragment is fragmented itself, we split
>          * it to two chunks: the first with data and paged part
>          * and the second, holding only fragments. */
> --
> 2.16.4
>