[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLkDwnZdBY8CwkrQwCk2o7EAM9J1sv+uxU1tjKb=VB=Ag@mail.gmail.com>
Date: Tue, 12 Mar 2024 14:48:31 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Florian Westphal <fw@...len.de>
Cc: xingwei lee <xrivendell7@...il.com>, pabeni@...hat.com, davem@...emloft.net,
kuba@...nel.org, linux-hams@...r.kernel.org, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, ralf@...ux-mips.org, syzkaller-bugs@...glegroups.com,
samsun1006219@...il.com
Subject: Re: KASAN: slab-use-after-free Read in ip_finish_output
On Tue, Mar 12, 2024 at 2:21 PM Florian Westphal <fw@...len.de> wrote:
>
> Eric Dumazet <edumazet@...gle.com> wrote:
> > > so skb->sk gets propagated down to __ip_finish_output(), long
> > > after connrack defrag has called skb_orphan().
> > >
> > > No idea yet how to fix it,
> >
> > My plan was to refine "inet: frag: Always orphan skbs inside
> > ip_defrag()" and only do the skb_orphan()
> > for skb added to a frag_list.
> >
> > The head skb would keep a reference to the socket.
>
> I tried to follow this but its beyond my abilities.
>
> Defrag messes with skb->truesize, and I do not know how to
> fix that up safely so later calls to destructor won't underflow sk
> accouting.
>
> Furthermore, depending on delivery order, the skb that gets
> passed to rest of stack might not be the head skb (the one with
> full l4 header and sk reference), its always the last one that arrived.
>
> Existing code skb_morphs() this, see inet_frag_reasm_prepare() and also
> the ->truesize munging (which is fine only because all skbs are
> orphans...).
>
> So in order to not pass already-released sk to inet output somehow
> the skb->sk reference needs to be stolen and moved from one sk
> to another.
>
> No idea how to do this, let alone do regression testing for this.
> see e.g. 48cac18ecf1de82f76259a54402c3adb7839ad01 which added
> unconditional orphaning in ipv6 netfilter defrag.
>
> ATM the only "solution" I see is to completely remove netfilter defrag
> support for outgoing packets.
Thanks for taking a look Florian.
Perhaps not messing with truesize at all would help ?
Something based on this POC :
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 153960663ce4c2389259e0dc2bb4ba0b011dc698..8fd688df4b24dec7b5a1ea365d5b34d81fe6e0e5
100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -94,6 +94,7 @@ struct inet_frag_queue {
struct rb_root rb_fragments;
struct sk_buff *fragments_tail;
struct sk_buff *last_run_head;
+ struct sock *sk;
ktime_t stamp;
int len;
int meat;
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 7072fc0783ef56e59c886a2f2516e7db7d10c942..aee706226ef8d9e9514fcf7d60e4d278ff7178fa
100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -297,6 +297,8 @@ void inet_frag_destroy(struct inet_frag_queue *q)
SKB_CONSUMED;
WARN_ON(del_timer(&q->timer) != 0);
+ if (q->sk)
+ sock_put(q->sk);
/* Release all fragment data. */
fqdir = q->fqdir;
f = fqdir->f;
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index a4941f53b523725cd777d213500b8f6918287920..198a4c35cd1232278678a20bf0f2a49c63f548fc
100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -377,6 +377,10 @@ static int ip_frag_queue(struct ipq *qp, struct
sk_buff *skb)
skb->_skb_refdst = 0UL;
err = ip_frag_reasm(qp, skb, prev_tail, dev);
+ if (qp->q.sk) {
+ swap(qp->q.sk, skb->sk);
+ skb->destructor = sock_edemux;
+ }
skb->_skb_refdst = orefdst;
if (err)
inet_frag_kill(&qp->q);
@@ -384,6 +388,7 @@ static int ip_frag_queue(struct ipq *qp, struct
sk_buff *skb)
}
skb_dst_drop(skb);
+ skb_orphan(skb);
return -EINPROGRESS;
insert_error:
@@ -487,7 +492,6 @@ int ip_defrag(struct net *net, struct sk_buff
*skb, u32 user)
struct ipq *qp;
__IP_INC_STATS(net, IPSTATS_MIB_REASMREQDS);
- skb_orphan(skb);
/* Lookup (or create) queue header */
qp = ip_find(net, ip_hdr(skb), user, vif);
@@ -495,7 +499,12 @@ int ip_defrag(struct net *net, struct sk_buff
*skb, u32 user)
int ret;
spin_lock(&qp->q.lock);
+ if (!qp->q.sk) {
+ struct sock *sk = skb->sk;
+ if (sk && refcount_inc_not_zero(&sk->sk_refcnt))
+ qp->q.sk = sk;
+ }
ret = ip_frag_queue(qp, skb);
spin_unlock(&qp->q.lock);
Powered by blists - more mailing lists