netdev - Re: [PATCH net-next] net/tls: fix race condition causing kernel panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c7a03544-89ec-696c-fa71-4a46e99d1e66@chelsio.com>
Date:   Fri, 22 May 2020 02:02:10 +0530
From:   Vinay Kumar Yadav <vinay.yadav@...lsio.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
        secdev@...lsio.com
Subject: Re: [PATCH net-next] net/tls: fix race condition causing kernel panic

Jakub

On 5/22/2020 12:26 AM, Jakub Kicinski wrote:
> On Thu, 21 May 2020 14:28:27 +0530 Vinay Kumar Yadav wrote:
>> Considering the lock in fix ("pending" is local variable), when writer reads
>> pending == 0 [pending = atomic_read(&ctx->encrypt_pending); --> from tls_sw_sendmsg()],
>> that means encrypt complete() [from tls_encrypt_done()] is already called.
> Please indulge me with full sentences. I can't parse this.

Here, I am explaining that your scenario is covered in this fix.

# writer code. spin_lock_bh(&ctx->encrypt_compl_lock);
pending = atomic_read(&ctx->encrypt_pending);
spin_unlock_bh(&ctx->encrypt_compl_lock);
if (pending)
  			crypto_wait_req(-EINPROGRESS, &ctx->async_wait);

# Completion code.

spin_lock_bh(&ctx->encrypt_compl_lock);
  	pending = atomic_dec_return(&ctx->encrypt_pending);
  
  	if (!pending && READ_ONCE(ctx->async_notify))
  		complete(&ctx->async_wait.completion);
spin_unlock_bh(&ctx->encrypt_compl_lock); # "pending" is local variable.
  
Your scenario:

     # 1. writer queues first record on CPU0
     # 2. encrypt completes on CPU1

  	    pending = atomic_dec_return(&ctx->decrypt_pending);
	    # pending is 0
  
     # IRQ comes and CPU1 goes off to do something else with spin lock held
     # writer proceeds to encrypt next record on CPU0
     # writer is done, enters wait

	    smp_store_mb(ctx->async_notify, true);

     # Now CPU1 is back from the interrupt, does the check

  	    if (!pending && READ_ONCE(ctx->async_notify))
  		   complete(&ctx->async_wait.completion);

     # and it completes the wait, even though the atomic decrypt_pending was
     #   bumped back to 1

Explanation:

When writer reads pending == 0,
that means completion is already called complete().
its okay writer to  initialize completion. When writer reads pending == 1,
that means writer is going to wait for completion.

This way, writer is not going to proceed to encrypt next record on CPU0 without complete().

>
>> and if pending == 1 [pending = atomic_read(&ctx->encrypt_pending); --> from tls_sw_sendmsg()],
>> that means writer is going to wait for atomic_dec_return(&ctx->decrypt_pending) and
>> complete() [from tls_encrypt_done()]  to be called atomically.
>>
>> This way, writer is not going to proceed to encrypt next record on CPU0 without complete().