[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <D16E20AA-9B19-42CA-A501-026E140F2792@gmail.com>
Date: Sat, 15 Nov 2025 15:25:59 +0800
From: Miao Wang <shankerwangmiao@...il.com>
To: Kuniyuki Iwashima <kuniyu@...gle.com>
Cc: netdev@...r.kernel.org
Subject: Re: [Question] Unexpected SO_PEEK_OFF behavior
Hi,
> 2025年11月15日 12:33,Kuniyuki Iwashima <kuniyu@...gle.com> 写道:
>
> From: Miao Wang <shankerwangmiao@...il.com>
> Date: Sat, 15 Nov 2025 05:03:44 +0800
>> Hi, all
>>
>> I learned from the Kernel documents that SO_PEEK_OFF manages an offset for a
>> socket. When using recv(MSG_PEEK), the returning data should start from the
>> offset. As stated in the manual, suppose the incoming data for a socket is
>> aaaabbbb, and the initial SO_PEEK_OFF is 0. Two calls of recv(fd, buf, 4,
>> MSG_PEEK) will return aaaa and bbbb respectively. However, I noticed that when
>> the incoming data is supplied in two batches, the second recv() will return in
>> total all the 8 bytes, instead of 4. As shown below:
>>
>> Receiver Sender
>> -------- ------
>> send(fd, "aaaabbbb", 8)
>> recv(fd, buf, 4, MSG_PEEK)
>> Get "aaaa" in buf
>> recv(fd, buf, 100, MSG_PEEK)
>> Get "bbbb" in buf
>> ------------------------------------------------
>> recv(fd, buf, 4, MSG_PEEK)
>> send(fd, "aaaa", 4)
>> Get "aaaa" in buf
>> recv(fd, buf, 100, MSG_PEEK)
>> send(fd, "bbbb", 4)
>> Get "aaaabbbb" in buf
>>
>>
>> I also notice that this only happens to the unix socket. I wonder if it is the
>> expected behavior? If so, how can one tell that if the returned data from
>> recv(MSG_PEEK) contains data before SO_PEEK_OFF?
>
> Thanks for the report !
>
> It is definitely the bug in the kernel.
>
> If you remove sleep(2) in your program, you will not see
> the weird behaviour.
>
> The problem is that once we peek the last skb (aaaa) and
> sleep (goto again; -> goto redo;), we need to reset @skip.
>
> This should fix the problem:
>
> ---8<---
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index e518116f8171..9e93bebff4ba 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -3000,6 +3000,8 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
> }
>
> mutex_lock(&u->iolock);
> +
> + skip = max(sk_peek_offset(sk, flags), 0);
> goto redo;
> unlock:
> unix_state_unlock(sk);
> ---8<---
>
> We could move the redo: label out of the loop but I need
> to check the history a bit more (18eceb818dc3, etc).
>
I did bisect on the relative code and saw the feature became broken since
commit 9f389e35674f (af_unix: return data from multiple SKBs on recv() with
MSG_PEEK flag) and the commit fix for it e9193d60d363 (net/unix: fix logic
about sk_peek_offset) did not fully fix it.
Cheers,
Miao Wang
Powered by blists - more mailing lists