[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20181024212219.je7vtvpjovlbdrqa@ast-mbp>
Date: Wed, 24 Oct 2018 14:23:13 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, kafai@...com,
daniel@...earbox.net
Subject: Re: [PATCH net-next 1/3] net/sock: factor out dequeue/peek with
offset code
On Tue, Oct 23, 2018 at 09:28:03AM +0200, Paolo Abeni wrote:
> Hi,
>
> On Mon, 2018-10-22 at 21:49 -0700, Alexei Starovoitov wrote:
> > On Mon, May 15, 2017 at 11:01:42AM +0200, Paolo Abeni wrote:
> > > And update __sk_queue_drop_skb() to work on the specified queue.
> > > This will help the udp protocol to use an additional private
> > > rx queue in a later patch.
> > >
> > > Signed-off-by: Paolo Abeni <pabeni@...hat.com>
> > > ---
> > > include/linux/skbuff.h | 7 ++++
> > > include/net/sock.h | 4 +--
> > > net/core/datagram.c | 90 ++++++++++++++++++++++++++++----------------------
> > > 3 files changed, 60 insertions(+), 41 deletions(-)
> > >
> > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > index a098d95..bfc7892 100644
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -3056,6 +3056,13 @@ static inline void skb_frag_list_init(struct sk_buff *skb)
> > >
> > > int __skb_wait_for_more_packets(struct sock *sk, int *err, long *timeo_p,
> > > const struct sk_buff *skb);
> > > +struct sk_buff *__skb_try_recv_from_queue(struct sock *sk,
> > > + struct sk_buff_head *queue,
> > > + unsigned int flags,
> > > + void (*destructor)(struct sock *sk,
> > > + struct sk_buff *skb),
> > > + int *peeked, int *off, int *err,
> > > + struct sk_buff **last);
> > > struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned flags,
> > > void (*destructor)(struct sock *sk,
> > > struct sk_buff *skb),
> > > diff --git a/include/net/sock.h b/include/net/sock.h
> > > index 66349e4..49d226f 100644
> > > --- a/include/net/sock.h
> > > +++ b/include/net/sock.h
> > > @@ -2035,8 +2035,8 @@ void sk_reset_timer(struct sock *sk, struct timer_list *timer,
> > >
> > > void sk_stop_timer(struct sock *sk, struct timer_list *timer);
> > >
> > > -int __sk_queue_drop_skb(struct sock *sk, struct sk_buff *skb,
> > > - unsigned int flags,
> > > +int __sk_queue_drop_skb(struct sock *sk, struct sk_buff_head *sk_queue,
> > > + struct sk_buff *skb, unsigned int flags,
> > > void (*destructor)(struct sock *sk,
> > > struct sk_buff *skb));
> > > int __sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
> > > diff --git a/net/core/datagram.c b/net/core/datagram.c
> > > index db1866f2..a4592b4 100644
> > > --- a/net/core/datagram.c
> > > +++ b/net/core/datagram.c
> > > @@ -161,6 +161,43 @@ static struct sk_buff *skb_set_peeked(struct sk_buff *skb)
> > > return skb;
> > > }
> > >
> > > +struct sk_buff *__skb_try_recv_from_queue(struct sock *sk,
> > > + struct sk_buff_head *queue,
> > > + unsigned int flags,
> > > + void (*destructor)(struct sock *sk,
> > > + struct sk_buff *skb),
> > > + int *peeked, int *off, int *err,
> > > + struct sk_buff **last)
> > > +{
> > > + struct sk_buff *skb;
> > > +
> > > + *last = queue->prev;
> >
> > this refactoring changed the behavior.
> > Now queue->prev is returned as last.
> > Whereas it was *last = queue before.
> >
> > > + skb_queue_walk(queue, skb) {
> >
> > and *last = skb assignment is gone too.
> >
> > Was this intentional ?
>
> Yes.
>
> > Is this the right behavior?
>
> I think so. queue->prev is the last skb in the queue. With the old
> code, __skb_try_recv_datagram(), when returning NULL, used the
> instructions you quoted to overall set 'last' to the last skb in the
> queue. We did not use 'last' elsewhere. So overall this just reduce the
> number of instructions inside the loop. (unless I'm missing something).
Right. On the second glance it does appear to be correct.
> Are you experiencing any specific issues due to the mentioned commit?
yes.
Just like what Baoyou Xie reported https://lore.kernel.org/patchwork/patch/962802/
we're hitting infinite loop in __skb_recv_datagram() on 4.11 kernel.
and different, but also buggy, behavior on the net-next.
In particular __skb_try_recv_datagram() returns immediately,
because skb_queue_empty() is true (sk->sk_receive_queue.next == &sk->sk_receive_queue)
But __skb_wait_for_more_packets() also returns immediately
because if (sk->sk_receive_queue.prev != skb) is also true.
There is a link list corruption in sk_receive_queue.
list->next == list, but list->prev still points to valid skb.
Before your commit we had
*last = queue;
and we had this infinite loop I described above.
After your commit
*last = queue->next;
which assigns buggy pointer into *last, but that accidentally
makes if (sk->sk_receive_queue.prev != skb) to be false
and __skb_wait_for_more_packets() goes into schedule_timeout().
Eventually bad things happen too, but in the different spot.
The corruption is somehow related to netlink_recvmsg() just like in that
Baoyou Xie report.
The typical stack trace is
__skb_wait_for_more_packets+0x64/0x140
? skb_gro_receive+0x310/0x310
__skb_recv_datagram+0x5c/0xa0
skb_recv_datagram+0x31/0x40
netlink_recvmsg+0x51/0x3c0
? sock_write_iter+0xf8/0x110
SYSC_recvfrom+0x116/0x190
We didn't figure out a way to reproduce it yet.
kasan didn't help.
The way netlink socket pushes skbs into sk_receive_queue and drains it
all looks correct. We thought it could be MSG_PEAK related, but
skb->users refcnting also looks correct.
If anyone have any ideas what things to try, I'm all ears.
Powered by blists - more mailing lists