lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <58a5f9c2-b179-49d9-b420-67caeff69f8b@oracle.com>
Date: Tue, 10 Sep 2024 13:57:04 -0700
From: Shoaib Rao <rao.shoaib@...cle.com>
To: Kuniyuki Iwashima <kuniyu@...zon.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        pabeni@...hat.com,
        syzbot+8811381d455e3e9ec788@...kaller.appspotmail.com,
        syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [net?] KASAN: slab-use-after-free Read in
 unix_stream_read_actor (2)



On 9/10/2024 12:49 PM, Kuniyuki Iwashima wrote:
> From: Shoaib Rao <rao.shoaib@...cle.com>
> Date: Tue, 10 Sep 2024 11:49:20 -0700
>> On 9/10/2024 11:33 AM, Kuniyuki Iwashima wrote:
>>> From: Shoaib Rao <rao.shoaib@...cle.com>
>>> Date: Tue, 10 Sep 2024 11:16:59 -0700
>>>> On 9/10/2024 10:57 AM, Kuniyuki Iwashima wrote:
>>>>> From: Shoaib Rao <rao.shoaib@...cle.com>
>>>>> Date: Tue, 10 Sep 2024 09:55:03 -0700
>>>>>> On 9/9/2024 5:48 PM, Kuniyuki Iwashima wrote:
>>>>>>> From: Shoaib Rao <rao.shoaib@...cle.com>
>>>>>>> Date: Mon, 9 Sep 2024 17:29:04 -0700
>>>>>>>> I have some more time investigating the issue. The sequence of packet
>>>>>>>> arrival and consumption definitely points to an issue with OOB handling
>>>>>>>> and I will be submitting a patch for that.
>>>>>>>
>>>>>>> It seems a bit late.
>>>>>>> My patches were applied few minutes before this mail was sent.
>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/netdev/172592764315.3964840.16480083161244716649.git-patchwork-notify@kernel.org/__;!!ACWV5N9M2RV99hQ!M806VrqNEGFgGXEoWG85msKAdFPXup7RzHy9Kt4q_HOfpPWsjNHn75KyFK3a3jWvOb9EEQuFGOjpqgk$
>>>>>>>
>>>>>>
>>>>>> That is a subpar fix. I am not sure why the maintainers accepted the fix
>>>>>> when it was clear that I was still looking into the issue.
>>>>>
>>>>> Just because it's not a subpar fix and you were slow and wrong,
>>>>> clining to triggering the KASAN splat without thinking much.
>>>>>
>>>>>
>>>>>> Plus the
>>>>>> claim that it fixes the panic is absolutely wrong.
>>>>>
>>>>> The _root_ cause of the splat is mishandling of OOB in manage_oob()
>>>>> which causes UAF later in another recvmsg().
>>>>>
>>>>> Honestly your patch is rather a subpar fix to me, few points:
>>>>>
>>>>>      1. The change conflicts with net-next as we have already removed
>>>>>         the additional unnecessary refcnt for OOB skb that has caused
>>>>>         so many issue reported by syzkaller
>>>>>
>>>>>      2. Removing OOB skb in queue_oob() relies on the unneeded refcnt
>>>>>         but it's not mentioned; if merge was done wrongly, another UAF
>>>>>         will be introduced in recvmsg()
>>>>>
>>>>>      3. Even the removing logic is completely unnecessary if manage_oob()
>>>>>         is changed
>>>>>
>>>>>      4. The scan_again: label is misplaced; two consecutive empty OOB skbs
>>>>>         never exist at the head of recvq
>>>>>
>>>>>      5. ioctl() is not fixed
>>>>>
>>>>>      6. No test added
>>>>>
>>>>>      7. Fixes: tag is bogus
>>>>>
>>>>>      8. Subject lacks target tree and af_unix prefix
>>>>
>>>> If you want to nit pick, nit pick away, Just because the patch email
>>>> lacks proper formatting does not make the patch technically inferior.
>>>
>>> Ironically you just nit picked 8.
>>>
>>>
>>
>> I have no idea what you mean. I am more worried about technical
>> correctness than formatting -- That does not mean formatting is not
>> necessary.
> 
> I started pointing out technical stuff and ended with nit-pick because
> "I am more worried about technical correctness", but you started nit
> picking from the last point.  That's unfortunate.
> 
> 
>>
>>>> My
>>>> fix is a proper fix not a hack. The change in queue_oob is sufficient to
>>>> fix all issues including SIOCATMARK. The fix in manage_oob is just for
>>>> correctness.
>>>
>>> Then, it should be WARN_ON_ONCE() not to confuse future readers.
>>>
>>>
>>>> In your fix I specifically did not like the change made to
>>>> fix SIOCATMARK.
>>>
>>> I don't like that part too, but it's needed to avoid the additional refcnt
>>> that is much worse as syzbot has been demonstrating.
>>>
>>
>> syzbot has nothing to do with doing a proper fix.
> 
> You don't understand my point.  syzbot has been finding many real issues
> that were caused by poor handling of the additional refcount.
> 
> Also, removing it discovered another bug in manage_oob().  That's a enough
> reason to explain why we should remove the unnecessary refcnt.
> 
> 
>> One has to understand
>> the code though to do the fix at the proper location.
> 
> I'm not saying that the patch is correct if it silences syzbot.
> 
> Actually, I said KASAN is handy but you need not rely on it.
> 
> Rather it's you who argued the splat was needed even without trying
> to understand the code.
> 
> I really don't understand why you are saying this to me now.
> 
> 
>>
>>>
>>>>
>>>> What is most worrying is claim to fixing a panic when it can not even
>>>> happen with the bug.
>>>
>>> It's only on your setup.  syzbot and I were able to trigger that with
>>> the bug.
>>>
>>
>> Really, what is so special about my setup that kasan does not like? Can
>> you point me to the exact location where the access is made?
> 
> I don't know, it's your job.
> 
>>
>> I am at least glad that you have backed off your assertion that my
>> change does not fix the ioctl.
> 
> Okay, I was wrong about that, and what about other points, fragile
> refcnt, non-WARN_ON_ONCE(), misplaced label, no test, bogus tag ?
> 
> 
>> I am sure if I keep pressing you, you
>> will back off the panic claim as well.
> 
> I also don't understand what you are saying and why you still can't
> correlate the splat and the sequences of syscalls in the repro.
> 
> 
>> You yourself admitted you did not
>> know why kasan was not panicing, Has anyone else hit the same panic?
>>
>> If you can pin point the exact location where the illegal access is
>> made, please do so and I will accept that I am wrong, other than that I
>> am not interested in this constant back and forth with no technical
>> details just fluff.
> 
> Please read my changelog (and mails) carefully that pin-point the
> exact location and reason where/why the illegal access happens.

This is the explanation in the email:

 >
 > No it did not work. As soon as KASAN detected read after free it should
 > have paniced as it did in the report and I have been running the
 > syzbot's C program in a continuous loop. I would like to reproduce the
 > issue before we can accept the fix -- If that is alright with you. I
 > will try your new test case later and report back. Thanks for the patch
 > though.

The commit Message:

syzbot reported use-after-free in unix_stream_recv_urg(). [0]

The scenario is

   1. send(MSG_OOB)
   2. recv(MSG_OOB)
      -> The consumed OOB remains in recv queue
   3. send(MSG_OOB)
   4. recv()
      -> manage_oob() returns the next skb of the consumed OOB
      -> This is also OOB, but unix_sk(sk)->oob_skb is not cleared
   5. recv(MSG_OOB)
      -> unix_sk(sk)->oob_skb is used but already freed

The recent commit 8594d9b85c07 ("af_unix: Don't call skb_get() for OOB
skb.") uncovered the issue.

If the OOB skb is consumed and the next skb is peeked in manage_oob(),
we still need to check if the skb is OOB.

Let's do so by falling back to the following checks in manage_oob()
and add the test case in selftest.

Note that we need to add a similar check for SIOCATMARK.

And this

* [PATCH v1 net-next] af_unix: Don't call skb_get() for OOB skb.
@ 2024-08-16 23:39 Kuniyuki Iwashima
   2024-08-20 23:00 ` patchwork-bot+netdevbpf
   0 siblings, 1 reply; 2+ messages in thread
From: Kuniyuki Iwashima @ 2024-08-16 23:39 UTC (permalink / raw)
   To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
   Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

Since introduced, OOB skb holds an additional reference count with no
special reason and caused many issues.

Also, kfree_skb() and consume_skb() are used to decrement the count,
which is confusing.

Let's drop the unnecessary skb_get() in queue_oob() and corresponding
kfree_skb(), consume_skb(), and skb_unref().

Now unix_sk(sk)->oob_skb is just a pointer to skb in the receive queue,
so special handing is no longer needed in GC.


And more:

The splat is handy, you may want to check the returned value of recvfrom()
with KASAN on and off and then focus on the root cause.  When UAF happens,
the real bug always happens before that.

FWIW, I was able to see the splat on my setup and it disappeared with
my patch.  Also, syzbot has already tested the equivalent change.
https://urldefense.com/v3/__https://lore.kernel.org/netdev/00000000000064fbcb06215a7bbc@google.com/__;!!ACWV5N9M2RV99hQ!MiZQCdyNumVmB3YeKfZxVbJFPsouXDCN7ie5DBfjBiurJib1D7k4bO_Jg1cAie9KIkAgXezFUaGOFhg$

None of this points to where the skb is "dereferenced" The test case 
added will never panic the kernel.

In the organizations that I have worked with, just because a change 
stops a panic does not mean the issue is fixed. You have to explicitly 
show where and how. That is what i am asking, Yes there is a bug, but it 
will not cause the panic, so if you are just high and might engiineer, 
show where and how?
> 
> This will be the last mail from me in this thread.  I don't want to
> waste time on someone who doesn't read mails.
Yes please don't if you do not have anything technical to say, all your 
comments are "smart comments". This email thread would end if you could 
just say, here is line XXXX where the skb is de referenced, but you have 
not because you have no idea.

BTTW Your comment about holding the extra refcnt without any reason just 
shows that you DO NOT understand the code. And the confusion to refcnts 
has been caused by your changes, I am concerned your changes have broken 
the code.

I am willing to accept that I may be wrong but only if I am shown the 
location of de-reference. Please do not respond if you can not point to 
the exact location.

Shoaib


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ