lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2dd7aea9-93a1-4fbb-91a8-b7f3acd02a60@oracle.com>
Date: Mon, 9 Sep 2024 17:29:04 -0700
From: Shoaib Rao <rao.shoaib@...cle.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Kuniyuki Iwashima <kuniyu@...zon.com>, davem@...emloft.net,
        kuba@...nel.org, linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        pabeni@...hat.com,
        syzbot+8811381d455e3e9ec788@...kaller.appspotmail.com,
        syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [net?] KASAN: slab-use-after-free Read in
 unix_stream_read_actor (2)



On 9/6/2024 10:06 PM, Shoaib Rao wrote:
> 
> On 9/6/2024 9:48 AM, Shoaib Rao wrote:
>>
>> On 9/6/2024 5:37 AM, Eric Dumazet wrote:
>>> On Thu, Sep 5, 2024 at 10:48 PM Shoaib Rao <rao.shoaib@...cle.com> 
>>> wrote:
>>>>
>>>> On 9/5/2024 1:35 PM, Kuniyuki Iwashima wrote:
>>>>> From: Shoaib Rao <rao.shoaib@...cle.com>
>>>>> Date: Thu, 5 Sep 2024 13:15:18 -0700
>>>>>> On 9/5/2024 12:46 PM, Kuniyuki Iwashima wrote:
>>>>>>> From: Shoaib Rao <rao.shoaib@...cle.com>
>>>>>>> Date: Thu, 5 Sep 2024 00:35:35 -0700
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I am not able to reproduce the issue. I have run the C program 
>>>>>>>> at least
>>>>>>>> 100 times in a loop. In the I do get an EFAULT, not sure if that is
>>>>>>>> intentional or not but no panic. Should I be doing something
>>>>>>>> differently? The kernel version I am using is
>>>>>>>> v6.11-rc6-70-gc763c4339688. Later I can try with the exact version.
>>>>>>> The -EFAULT is the bug meaning that we were trying to read an 
>>>>>>> consumed skb.
>>>>>>>
>>>>>>> But the first bug is in recvfrom() that shouldn't be able to read 
>>>>>>> OOB skb
>>>>>>> without MSG_OOB, which doesn't clear unix_sk(sk)->oob_skb, and later
>>>>>>> something bad happens.
>>>>>>>
>>>>>>>      socketpair(AF_UNIX, SOCK_STREAM, 0, [3, 4]) = 0
>>>>>>>      sendmsg(4, {msg_name=NULL, msg_namelen=0, 
>>>>>>> msg_iov=[{iov_base="\333", iov_len=1}], msg_iovlen=1, 
>>>>>>> msg_controllen=0, msg_flags=0}, MSG_OOB|MSG_DONTWAIT) = 1
>>>>>>>      recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=NULL, 
>>>>>>> msg_iovlen=0, msg_controllen=0, msg_flags=MSG_OOB}, MSG_OOB| 
>>>>>>> MSG_WAITFORONE) = 1
>>>>>>>      sendmsg(4, {msg_name=NULL, msg_namelen=0, 
>>>>>>> msg_iov=[{iov_base="\21", iov_len=1}], msg_iovlen=1, 
>>>>>>> msg_controllen=0, msg_flags=0}, MSG_OOB|MSG_NOSIGNAL|MSG_MORE) = 1
>>>>>>>> recvfrom(3, "\21", 125, MSG_DONTROUTE|MSG_TRUNC|MSG_DONTWAIT, 
>>>>>>>> NULL, NULL) = 1
>>>>>>>      recvmsg(3, {msg_namelen=0}, MSG_OOB|MSG_ERRQUEUE) = -1 
>>>>>>> EFAULT (Bad address)
>>>>>>>
>>>>>>> I posted a fix officially:
>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/ 
>>>>>>> netdev/20240905193240.17565-5-kuniyu@...zon.com/__;!! 
>>>>>>> ACWV5N9M2RV99hQ! 
>>>>>>> IJeFvLdaXIRN2ABsMFVaKOEjI3oZb2kUr6ld6ZRJCPAVum4vuyyYwUP6_5ZH9mGZiJDn6vrbxBAOqYI$
>>>>>> Thanks that is great. Isn't EFAULT,  normally indicative of an issue
>>>>>> with the user provided address of the buffer, not the kernel buffer.
>>>>> Normally, it's used when copy_to_user() or copy_from_user() or
>>>>> something similar failed.
>>>>>
>>>>> But this time, if you turn KASAN off, you'll see the last recvmsg()
>>>>> returns 1-byte garbage instead of -EFAULT, so actually KASAN worked
>>>>> on your host, I guess.
>>>> No it did not work. As soon as KASAN detected read after free it should
>>>> have paniced as it did in the report and I have been running the
>>>> syzbot's C program in a continuous loop. I would like to reproduce the
>>>> issue before we can accept the fix -- If that is alright with you. I
>>>> will try your new test case later and report back. Thanks for the patch
>>>> though.
>>> KASAN does not panic unless you request it.
>>>
>>> Documentation/dev-tools/kasan.rst
>>>
>>> KASAN is affected by the generic ``panic_on_warn`` command line 
>>> parameter.
>>> When it is enabled, KASAN panics the kernel after printing a bug report.
>>>
>>> By default, KASAN prints a bug report only for the first invalid 
>>> memory access.
>>> With ``kasan_multi_shot``, KASAN prints a report on every invalid 
>>> access. This
>>> effectively disables ``panic_on_warn`` for KASAN reports.
>>>
>>> Alternatively, independent of ``panic_on_warn``, the ``kasan.fault=`` 
>>> boot
>>> parameter can be used to control panic and reporting behaviour:
>>>
>>> - ``kasan.fault=report``, ``=panic``, or ``=panic_on_write`` controls 
>>> whether
>>>    to only print a KASAN report, panic the kernel, or panic the 
>>> kernel on
>>>    invalid writes only (default: ``report``). The panic happens even if
>>>    ``kasan_multi_shot`` is enabled. Note that when using asynchronous 
>>> mode of
>>>    Hardware Tag-Based KASAN, ``kasan.fault=panic_on_write`` always 
>>> panics on
>>>    asynchronously checked accesses (including reads).
>>
>> Hi Eric,
>>
>> Thanks for the update. I forgot to mention that I I did set /proc/sys/ 
>> kernel/panic_on_warn to 1. I ran the program over night in two 
>> separate windows, there are no reports and no panic. I first try to 
>> reproduce the issue, because if I can not, how can I be sure that I 
>> have fixed that bug? I may find another issue and fix it but not the 
>> one that I was trying to. Please be assured that I am not done, I 
>> continue to investigate the issue.
>>
>> If someone has a way of reproducing the failure please kindly let me 
>> know.
>>
>> Kind regards,
>>
>> Shoaib
>>
> I have tried reproducing using the newly added tests but no luck. I will 
> keep trying but if there is another occurrence please let me know. I am 
> using an AMD system but that should not have any impact.
> 
> Shoaib
> 

I have some more time investigating the issue. The sequence of packet 
arrival and consumption definitely points to an issue with OOB handling 
and I will be submitting a patch for that.

kasan does not report any issue because there are none. While the 
handling is incorrect, at no point freed memory is accessed. EFAULT 
error code is returned from __skb_datagram_iter()

/* This is not really a user copy fault, but rather someone 

  * gave us a bogus length on the skb.  We should probably 

  * print a warning here as it may indicate a kernel bug. 

  */ 


fault: 

     iov_iter_revert(to, offset - start_off); 

     return -EFAULT;

As the comment says, the issue is that the skb in question has a bogus 
length. Due to the bug in handling, the OOB byte has already been read 
as a regular byte, but oob pointer is not cleared, So when a read with 
OOB flag is issued, the code calls __skb_datagram_iter with the skb 
pointer which has a length of zero. The code detects it and returns the 
error. Any doubts can be verified by checking the refcnt on the skb.

My conclusion is that the bug report by syzbot is not caused by the 
mishandling of OOB, unless there was code added to disregard the skb 
length and read a byte.

The error being returned is confusing. The callers should not pass this 
error to the application. They should process the error.

Shoaib


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ