netdev - Re: [PATCH] net/tls: Fix slab-use-after-free in tls_encrypt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID:
 <VI1P193MB0752F2E6DA346CDB2BCDBED899D6A@VI1P193MB0752.EURP193.PROD.OUTLOOK.COM>
Date: Tue, 17 Oct 2023 19:49:15 +0800
From: Juntong Deng <juntong.deng@...look.com>
To: Paolo Abeni <pabeni@...hat.com>, borisp@...dia.com,
 john.fastabend@...il.com, kuba@...nel.org, davem@...emloft.net,
 edumazet@...gle.com
Cc: netdev@...r.kernel.org, linux-kernel-mentees@...ts.linuxfoundation.org,
 linux-kernel@...r.kernel.org,
 syzbot+29c22ea2d6b2c5fd2eae@...kaller.appspotmail.com
Subject: Re: [PATCH] net/tls: Fix slab-use-after-free in tls_encrypt_done

On 2023/10/17 18:31, Paolo Abeni wrote:
> On Thu, 2023-10-12 at 19:02 +0800, Juntong Deng wrote:
>> In the current implementation, ctx->async_wait.completion is completed
>> after spin_lock_bh, which causes tls_sw_release_resources_tx to
>> continue executing and return to tls_sk_proto_cleanup, then return
>> to tls_sk_proto_close, and after that enter tls_sw_free_ctx_tx to kfree
>> the entire struct tls_context (including ctx->encrypt_compl_lock).
>>
>> Since ctx->encrypt_compl_lock has been freed, subsequent spin_unlock_bh
>> will result in slab-use-after-free error. Due to SMP, even using
>> spin_lock_bh does not prevent tls_sw_release_resources_tx from continuing
>> on other CPUs. After tls_sw_release_resources_tx is woken up, there is no
>> attempt to hold ctx->encrypt_compl_lock again, therefore everything
>> described above is possible.
>>
>> The fix is to put complete(&ctx->async_wait.completion) after
>> spin_unlock_bh, making the release after the unlock. Since complete is
>> only executed if pending is 0, which means this is the last record, there
>> is no need to worry about race condition causing duplicate completes.
>>
>> Reported-by: syzbot+29c22ea2d6b2c5fd2eae@...kaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=29c22ea2d6b2c5fd2eae
>> Signed-off-by: Juntong Deng <juntong.deng@...look.com>
> 
> Have you tested this patch vs the syzbot reproducer?
> 
> I think the following race is still present:
> 
> CPU0                            CPU1
> tls_sw_release_resources_tx     tls_encrypt_done
> spin_lock_bh
> spin_unlock_bh
>                                  spin_lock_bh
>                                  spin_unlock_bh
>                                  complete
> 
> wait
> // ...
> tls_sk_proto_close
> 
>                                  test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask
> 				// UaF
> 
> regardless of 'complete()' being invoked before or after the
> 'spin_unlock_bh()'.
> 
> Paolo
> 

Yes, I think you are right.

My previous thought was that test_and_set_bit() is only called if
'ready' is true, but 'ready' will only be true on the first record,
and complete() is only called when processing the last record.

I simply thought before that the first record would not be the last
record, so I thought before that the test_and_set_bit() would not be
called when complete() was called.

But your reply inspired me and I thought about it carefully and the
situation with only one record is possible.

I will make version 2 patch to solve this problem.

Thanks.