netdev - Re: [Question] Unexpected SO_PEEK

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <D16E20AA-9B19-42CA-A501-026E140F2792@gmail.com>
Date: Sat, 15 Nov 2025 15:25:59 +0800
From: Miao Wang <shankerwangmiao@...il.com>
To: Kuniyuki Iwashima <kuniyu@...gle.com>
Cc: netdev@...r.kernel.org
Subject: Re: [Question] Unexpected SO_PEEK_OFF behavior

Hi, 

> 2025年11月15日 12:33，Kuniyuki Iwashima <kuniyu@...gle.com> 写道：
> 
> From: Miao Wang <shankerwangmiao@...il.com>
> Date: Sat, 15 Nov 2025 05:03:44 +0800
>> Hi, all
>> 
>> I learned from the Kernel documents that SO_PEEK_OFF manages an offset for a
>> socket. When using recv(MSG_PEEK), the returning data should start from the
>> offset. As stated in the manual, suppose the incoming data for a socket is
>> aaaabbbb, and the initial SO_PEEK_OFF is 0. Two calls of recv(fd, buf, 4, 
>> MSG_PEEK) will return aaaa and bbbb respectively. However, I noticed that when 
>> the incoming data is supplied in two batches, the second recv() will return in 
>> total all the 8 bytes, instead of 4. As shown below:
>> 
>> Receiver                     Sender
>> --------                     ------
>>                             send(fd, "aaaabbbb", 8)
>> recv(fd, buf, 4, MSG_PEEK)
>> Get "aaaa" in buf
>> recv(fd, buf, 100, MSG_PEEK)
>> Get "bbbb" in buf
>> ------------------------------------------------
>> recv(fd, buf, 4, MSG_PEEK)
>>                             send(fd, "aaaa", 4)
>> Get "aaaa" in buf
>> recv(fd, buf, 100, MSG_PEEK)
>>                             send(fd, "bbbb", 4)
>> Get "aaaabbbb" in buf
>> 
>> 
>> I also notice that this only happens to the unix socket. I wonder if it is the
>> expected behavior? If so, how can one tell that if the returned data from
>> recv(MSG_PEEK) contains data before SO_PEEK_OFF?
> 
> Thanks for the report !
> 
> It is definitely the bug in the kernel.
> 
> If you remove sleep(2) in your program, you will not see
> the weird behaviour.
> 
> The problem is that once we peek the last skb (aaaa) and
> sleep (goto again; -> goto redo;), we need to reset @skip.
> 
> This should fix the problem:
> 
> ---8<---
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index e518116f8171..9e93bebff4ba 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -3000,6 +3000,8 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
> }
> 
> mutex_lock(&u->iolock);
> +
> + skip = max(sk_peek_offset(sk, flags), 0);
> goto redo;
> unlock:
> unix_state_unlock(sk);
> ---8<---
> 
> We could move the redo: label out of the loop but I need
> to check the history a bit more (18eceb818dc3, etc).
> 

I did bisect on the relative code and saw the feature became broken since
commit 9f389e35674f (af_unix: return data from multiple SKBs on recv() with
MSG_PEEK flag) and the commit fix for it e9193d60d363 (net/unix: fix logic
about sk_peek_offset) did not fully fix it.

Cheers,

Miao Wang