[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1315340388.3400.28.camel@edumazet-laptop>
Date: Tue, 06 Sep 2011 22:19:48 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: "Yan, Zheng" <zheng.z.yan@...el.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"sfr@...b.auug.org.au" <sfr@...b.auug.org.au>,
"jirislaby@...il.com" <jirislaby@...il.com>,
"sedat.dilek@...il.com" <sedat.dilek@...il.com>, alex.shi@...el.com
Subject: Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
Le mardi 06 septembre 2011 à 12:59 -0700, Tim Chen a écrit :
> On Tue, 2011-09-06 at 21:43 +0200, Eric Dumazet wrote:
> > Le mardi 06 septembre 2011 à 12:33 -0700, Tim Chen a écrit :
> >
> > > Yes, I think locking the sendmsg for the entire duration of
> > > unix_stream_sendmsg makes a lot of sense. It simplifies the logic a lot
> > > more. I'll try to cook something up in the next couple of days.
> >
> > Thats not really possible, we cant hold a spinlock and call
> > sock_alloc_send_skb() and/or memcpy_fromiovec(), wich might sleep.
> >
> > You would need to prepare the full skb list, then :
> > - stick the ref on the last skb of the list.
> >
> > Transfert the whole skb list in other->sk_receive_queue in one go,
> > instead of one after another.
> >
> > Unfortunately, this would break streaming (big send(), and another
> > thread doing the receive)
> >
> > Listen, I am wondering why hackbench even triggers SCM code. This is
> > really odd. We should not have a _single_ pid/cred ref/unref at all.
> >
>
> Hackbench triggers the code because it has a bunch of threads sending
> msgs on UNIX socket.
> >
>
> Well, if the lock socket approach doesn't work, then my original patch
> plus Yan Zheng's fix should still work. I'll try to answer your
> objections below:
>
>
> > I was discussing of things after proposed patch, not current net-next.
> >
> > This reads :
> >
> > err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, scm_ref);
> >
> > So first skb is sent without ref taken, as mentioned in Changelog ?
> >
>
> No. the first skb is sent *with* ref taken, as scm_ref is set to true for
> first skb.
>
> >
> > If second skb cannot be built, we exit this system call with an already
> > queued skb. Receiver can then access to freed memory.
> >
>
> No, we do have reference set. For first skb, in unix_scm_to_skb. For the
> second skb (which is the last skb), in scm_sent. Should the second skb alloc failed,
> we'll release the ref in scm_destroy. Otherwise, the receiver will release
> the references will consuming the skb.
>
This is crap. This is not the intent of the code I read from the patch.
unless scm_ref really means scm_noref ?
I really hate this patch. I mean it.
I read it 10 times, spent 2 hours and still dont understand it.
@@ -1577,6 +1577,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
int sent = 0;
struct scm_cookie tmp_scm;
bool fds_sent = false;
+ bool scm_ref = true;
int max_level;
if (NULL == siocb->scm)
@@ -1637,12 +1638,15 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
*/
size = min_t(int, size, skb_tailroom(skb));
+ /* pass the scm reference to the very last skb */
HERE: I understand : on the last skb, set scm_ref to false.
So comment is wrong.
+ if (sent + size >= len)
+ scm_ref = false;
- /* Only send the fds and no ref to pid in the first buffer */
- err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, fds_sent);
+ /* Only send the fds in the first buffer */
+ err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, scm_ref);
if (err < 0) {
kfree_skb(skb);
- goto out;
+ goto out_err;
}
As I said, we should revert the buggy patch, and rewrite a performance
fix from scratch, with not a single get_pid()/put_pid() in fast path.
read()/write() on AF_UNIX sockets should not use a single
get_pid()/put_pid().
This is a serious regression we should fix at 100%, not 50% or even 75%,
adding serious bugs.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists