[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2dd7aea9-93a1-4fbb-91a8-b7f3acd02a60@oracle.com>
Date: Mon, 9 Sep 2024 17:29:04 -0700
From: Shoaib Rao <rao.shoaib@...cle.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Kuniyuki Iwashima <kuniyu@...zon.com>, davem@...emloft.net,
kuba@...nel.org, linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
pabeni@...hat.com,
syzbot+8811381d455e3e9ec788@...kaller.appspotmail.com,
syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [net?] KASAN: slab-use-after-free Read in
unix_stream_read_actor (2)
On 9/6/2024 10:06 PM, Shoaib Rao wrote:
>
> On 9/6/2024 9:48 AM, Shoaib Rao wrote:
>>
>> On 9/6/2024 5:37 AM, Eric Dumazet wrote:
>>> On Thu, Sep 5, 2024 at 10:48 PM Shoaib Rao <rao.shoaib@...cle.com>
>>> wrote:
>>>>
>>>> On 9/5/2024 1:35 PM, Kuniyuki Iwashima wrote:
>>>>> From: Shoaib Rao <rao.shoaib@...cle.com>
>>>>> Date: Thu, 5 Sep 2024 13:15:18 -0700
>>>>>> On 9/5/2024 12:46 PM, Kuniyuki Iwashima wrote:
>>>>>>> From: Shoaib Rao <rao.shoaib@...cle.com>
>>>>>>> Date: Thu, 5 Sep 2024 00:35:35 -0700
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I am not able to reproduce the issue. I have run the C program
>>>>>>>> at least
>>>>>>>> 100 times in a loop. In the I do get an EFAULT, not sure if that is
>>>>>>>> intentional or not but no panic. Should I be doing something
>>>>>>>> differently? The kernel version I am using is
>>>>>>>> v6.11-rc6-70-gc763c4339688. Later I can try with the exact version.
>>>>>>> The -EFAULT is the bug meaning that we were trying to read an
>>>>>>> consumed skb.
>>>>>>>
>>>>>>> But the first bug is in recvfrom() that shouldn't be able to read
>>>>>>> OOB skb
>>>>>>> without MSG_OOB, which doesn't clear unix_sk(sk)->oob_skb, and later
>>>>>>> something bad happens.
>>>>>>>
>>>>>>> socketpair(AF_UNIX, SOCK_STREAM, 0, [3, 4]) = 0
>>>>>>> sendmsg(4, {msg_name=NULL, msg_namelen=0,
>>>>>>> msg_iov=[{iov_base="\333", iov_len=1}], msg_iovlen=1,
>>>>>>> msg_controllen=0, msg_flags=0}, MSG_OOB|MSG_DONTWAIT) = 1
>>>>>>> recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=NULL,
>>>>>>> msg_iovlen=0, msg_controllen=0, msg_flags=MSG_OOB}, MSG_OOB|
>>>>>>> MSG_WAITFORONE) = 1
>>>>>>> sendmsg(4, {msg_name=NULL, msg_namelen=0,
>>>>>>> msg_iov=[{iov_base="\21", iov_len=1}], msg_iovlen=1,
>>>>>>> msg_controllen=0, msg_flags=0}, MSG_OOB|MSG_NOSIGNAL|MSG_MORE) = 1
>>>>>>>> recvfrom(3, "\21", 125, MSG_DONTROUTE|MSG_TRUNC|MSG_DONTWAIT,
>>>>>>>> NULL, NULL) = 1
>>>>>>> recvmsg(3, {msg_namelen=0}, MSG_OOB|MSG_ERRQUEUE) = -1
>>>>>>> EFAULT (Bad address)
>>>>>>>
>>>>>>> I posted a fix officially:
>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/
>>>>>>> netdev/20240905193240.17565-5-kuniyu@...zon.com/__;!!
>>>>>>> ACWV5N9M2RV99hQ!
>>>>>>> IJeFvLdaXIRN2ABsMFVaKOEjI3oZb2kUr6ld6ZRJCPAVum4vuyyYwUP6_5ZH9mGZiJDn6vrbxBAOqYI$
>>>>>> Thanks that is great. Isn't EFAULT, normally indicative of an issue
>>>>>> with the user provided address of the buffer, not the kernel buffer.
>>>>> Normally, it's used when copy_to_user() or copy_from_user() or
>>>>> something similar failed.
>>>>>
>>>>> But this time, if you turn KASAN off, you'll see the last recvmsg()
>>>>> returns 1-byte garbage instead of -EFAULT, so actually KASAN worked
>>>>> on your host, I guess.
>>>> No it did not work. As soon as KASAN detected read after free it should
>>>> have paniced as it did in the report and I have been running the
>>>> syzbot's C program in a continuous loop. I would like to reproduce the
>>>> issue before we can accept the fix -- If that is alright with you. I
>>>> will try your new test case later and report back. Thanks for the patch
>>>> though.
>>> KASAN does not panic unless you request it.
>>>
>>> Documentation/dev-tools/kasan.rst
>>>
>>> KASAN is affected by the generic ``panic_on_warn`` command line
>>> parameter.
>>> When it is enabled, KASAN panics the kernel after printing a bug report.
>>>
>>> By default, KASAN prints a bug report only for the first invalid
>>> memory access.
>>> With ``kasan_multi_shot``, KASAN prints a report on every invalid
>>> access. This
>>> effectively disables ``panic_on_warn`` for KASAN reports.
>>>
>>> Alternatively, independent of ``panic_on_warn``, the ``kasan.fault=``
>>> boot
>>> parameter can be used to control panic and reporting behaviour:
>>>
>>> - ``kasan.fault=report``, ``=panic``, or ``=panic_on_write`` controls
>>> whether
>>> to only print a KASAN report, panic the kernel, or panic the
>>> kernel on
>>> invalid writes only (default: ``report``). The panic happens even if
>>> ``kasan_multi_shot`` is enabled. Note that when using asynchronous
>>> mode of
>>> Hardware Tag-Based KASAN, ``kasan.fault=panic_on_write`` always
>>> panics on
>>> asynchronously checked accesses (including reads).
>>
>> Hi Eric,
>>
>> Thanks for the update. I forgot to mention that I I did set /proc/sys/
>> kernel/panic_on_warn to 1. I ran the program over night in two
>> separate windows, there are no reports and no panic. I first try to
>> reproduce the issue, because if I can not, how can I be sure that I
>> have fixed that bug? I may find another issue and fix it but not the
>> one that I was trying to. Please be assured that I am not done, I
>> continue to investigate the issue.
>>
>> If someone has a way of reproducing the failure please kindly let me
>> know.
>>
>> Kind regards,
>>
>> Shoaib
>>
> I have tried reproducing using the newly added tests but no luck. I will
> keep trying but if there is another occurrence please let me know. I am
> using an AMD system but that should not have any impact.
>
> Shoaib
>
I have some more time investigating the issue. The sequence of packet
arrival and consumption definitely points to an issue with OOB handling
and I will be submitting a patch for that.
kasan does not report any issue because there are none. While the
handling is incorrect, at no point freed memory is accessed. EFAULT
error code is returned from __skb_datagram_iter()
/* This is not really a user copy fault, but rather someone
* gave us a bogus length on the skb. We should probably
* print a warning here as it may indicate a kernel bug.
*/
fault:
iov_iter_revert(to, offset - start_off);
return -EFAULT;
As the comment says, the issue is that the skb in question has a bogus
length. Due to the bug in handling, the OOB byte has already been read
as a regular byte, but oob pointer is not cleared, So when a read with
OOB flag is issued, the code calls __skb_datagram_iter with the skb
pointer which has a length of zero. The code detects it and returns the
error. Any doubts can be verified by checking the refcnt on the skb.
My conclusion is that the bug report by syzbot is not caused by the
mishandling of OOB, unless there was code added to disregard the skb
length and read a byte.
The error being returned is confusing. The callers should not pass this
error to the application. They should process the error.
Shoaib
Powered by blists - more mailing lists