[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAM7YAnqLyK6JWPW_Y8wD=ykqWMn4fPdJ3_7yUUB+TQZWfDJzQ@mail.gmail.com>
Date: Wed, 7 Sep 2011 07:09:17 +0800
From: "Yan, Zheng" <zheng.z.yan@...ux.intel.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Tim Chen <tim.c.chen@...ux.intel.com>,
"Yan, Zheng" <zheng.z.yan@...el.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"sfr@...b.auug.org.au" <sfr@...b.auug.org.au>,
"jirislaby@...il.com" <jirislaby@...il.com>,
"sedat.dilek@...il.com" <sedat.dilek@...il.com>, alex.shi@...el.com
Subject: Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
On Wed, Sep 7, 2011 at 4:19 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> Le mardi 06 septembre 2011 à 12:59 -0700, Tim Chen a écrit :
>> On Tue, 2011-09-06 at 21:43 +0200, Eric Dumazet wrote:
>> > Le mardi 06 septembre 2011 à 12:33 -0700, Tim Chen a écrit :
>> >
>> > > Yes, I think locking the sendmsg for the entire duration of
>> > > unix_stream_sendmsg makes a lot of sense. It simplifies the logic a lot
>> > > more. I'll try to cook something up in the next couple of days.
>> >
>> > Thats not really possible, we cant hold a spinlock and call
>> > sock_alloc_send_skb() and/or memcpy_fromiovec(), wich might sleep.
>> >
>> > You would need to prepare the full skb list, then :
>> > - stick the ref on the last skb of the list.
>> >
>> > Transfert the whole skb list in other->sk_receive_queue in one go,
>> > instead of one after another.
>> >
>> > Unfortunately, this would break streaming (big send(), and another
>> > thread doing the receive)
>> >
>> > Listen, I am wondering why hackbench even triggers SCM code. This is
>> > really odd. We should not have a _single_ pid/cred ref/unref at all.
>> >
>>
>> Hackbench triggers the code because it has a bunch of threads sending
>> msgs on UNIX socket.
>> >
>>
>> Well, if the lock socket approach doesn't work, then my original patch
>> plus Yan Zheng's fix should still work. I'll try to answer your
>> objections below:
>>
>>
>> > I was discussing of things after proposed patch, not current net-next.
>> >
>> > This reads :
>> >
>> > err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, scm_ref);
>> >
>> > So first skb is sent without ref taken, as mentioned in Changelog ?
>> >
>>
>> No. the first skb is sent *with* ref taken, as scm_ref is set to true for
>> first skb.
>>
>> >
>> > If second skb cannot be built, we exit this system call with an already
>> > queued skb. Receiver can then access to freed memory.
>> >
>>
>> No, we do have reference set. For first skb, in unix_scm_to_skb. For the
>> second skb (which is the last skb), in scm_sent. Should the second skb alloc failed,
>> we'll release the ref in scm_destroy. Otherwise, the receiver will release
>> the references will consuming the skb.
>>
>
> This is crap. This is not the intent of the code I read from the patch.
>
> unless scm_ref really means scm_noref ?
>
> I really hate this patch. I mean it.
>
> I read it 10 times, spent 2 hours and still dont understand it.
>
Sorry, scm_ref means "sender hold a scm reference". I should add comment for it.
>
> @@ -1577,6 +1577,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
> int sent = 0;
> struct scm_cookie tmp_scm;
> bool fds_sent = false;
> + bool scm_ref = true;
> int max_level;
>
> if (NULL == siocb->scm)
> @@ -1637,12 +1638,15 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
> */
> size = min_t(int, size, skb_tailroom(skb));
>
> + /* pass the scm reference to the very last skb */
>
> HERE: I understand : on the last skb, set scm_ref to false.
> So comment is wrong.
>
> + if (sent + size >= len)
> + scm_ref = false;
>
> - /* Only send the fds and no ref to pid in the first buffer */
> - err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, fds_sent);
> + /* Only send the fds in the first buffer */
> + err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, scm_ref);
> if (err < 0) {
> kfree_skb(skb);
> - goto out;
> + goto out_err;
> }
>
>
>
> As I said, we should revert the buggy patch, and rewrite a performance
> fix from scratch, with not a single get_pid()/put_pid() in fast path.
>
> read()/write() on AF_UNIX sockets should not use a single
> get_pid()/put_pid().
>
> This is a serious regression we should fix at 100%, not 50% or even 75%,
> adding serious bugs.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists