lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 22 May 2020 02:02:10 +0530
From:   Vinay Kumar Yadav <vinay.yadav@...lsio.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
        secdev@...lsio.com
Subject: Re: [PATCH net-next] net/tls: fix race condition causing kernel panic

Jakub

On 5/22/2020 12:26 AM, Jakub Kicinski wrote:
> On Thu, 21 May 2020 14:28:27 +0530 Vinay Kumar Yadav wrote:
>> Considering the lock in fix ("pending" is local variable), when writer reads
>> pending == 0 [pending = atomic_read(&ctx->encrypt_pending); --> from tls_sw_sendmsg()],
>> that means encrypt complete() [from tls_encrypt_done()] is already called.
> Please indulge me with full sentences. I can't parse this.

Here, I am explaining that your scenario is covered in this fix.

# writer code. spin_lock_bh(&ctx->encrypt_compl_lock);
pending = atomic_read(&ctx->encrypt_pending);
spin_unlock_bh(&ctx->encrypt_compl_lock);
if (pending)
  			crypto_wait_req(-EINPROGRESS, &ctx->async_wait);

# Completion code.

spin_lock_bh(&ctx->encrypt_compl_lock);
  	pending = atomic_dec_return(&ctx->encrypt_pending);
  
  	if (!pending && READ_ONCE(ctx->async_notify))
  		complete(&ctx->async_wait.completion);
spin_unlock_bh(&ctx->encrypt_compl_lock); # "pending" is local variable.
  
Your scenario:

     # 1. writer queues first record on CPU0
     # 2. encrypt completes on CPU1

  	    pending = atomic_dec_return(&ctx->decrypt_pending);
	    # pending is 0
  
     # IRQ comes and CPU1 goes off to do something else with spin lock held
     # writer proceeds to encrypt next record on CPU0
     # writer is done, enters wait

	    smp_store_mb(ctx->async_notify, true);

     # Now CPU1 is back from the interrupt, does the check

  	    if (!pending && READ_ONCE(ctx->async_notify))
  		   complete(&ctx->async_wait.completion);

     # and it completes the wait, even though the atomic decrypt_pending was
     #   bumped back to 1

Explanation:

When writer reads pending == 0,
that means completion is already called complete().
its okay writer to  initialize completion. When writer reads pending == 1,
that means writer is going to wait for completion.

This way, writer is not going to proceed to encrypt next record on CPU0 without complete().

>
>> and if pending == 1 [pending = atomic_read(&ctx->encrypt_pending); --> from tls_sw_sendmsg()],
>> that means writer is going to wait for atomic_dec_return(&ctx->decrypt_pending) and
>> complete() [from tls_encrypt_done()]  to be called atomically.
>>
>> This way, writer is not going to proceed to encrypt next record on CPU0 without complete().

Powered by blists - more mailing lists