netdev - Re: [PATCH 1/2] tls: remove close callback sock unlock/lock and flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5d166d2deacfe_10452ad82c16e5c0a5@john-XPS-13-9370.notmuch>
Date:   Fri, 28 Jun 2019 12:40:29 -0700
From:   John Fastabend <john.fastabend@...il.com>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>,
        John Fastabend <john.fastabend@...il.com>
Cc:     daniel@...earbox.io, ast@...nel.org, netdev@...r.kernel.org,
        edumazet@...gle.com, bpf@...r.kernel.org
Subject: Re: [PATCH 1/2] tls: remove close callback sock unlock/lock and
 flush_sync

Jakub Kicinski wrote:
> On Fri, 28 Jun 2019 07:12:07 -0700, John Fastabend wrote:
> > Yeah seems possible although never seen in my testing. So I'll
> > move the test_bit() inside the lock and do a ctx check to ensure
> > still have the reference.
> > 
> >   CPU 0 (free)           CPU 1 (wq)
> > 
> >   lock(sk)
> >                          lock(sk)
> >   set_bit()
> >   cancel_work()
> >   release
> >                          ctx = tls_get_ctx(sk)
> >                          unlikely(!ctx) <- we may have free'd 
> >                          test_bit()
> >                          ...
> >                          release()
> > 
> > or
> > 
> >   CPU 0 (free)           CPU 1 (wq)
> > 
> >                          lock(sk)
> >   lock(sk)
> >                          ctx = tls_get_ctx(sk)
> >                          unlikely(!ctx)
> >                          test_bit()
> >                          ...
> >                          release()
> >   set_bit()
> >   cancel_work()
> >   release
> 
> Hmm... perhaps it's cleanest to stop the work from scheduling before we
> proceed?
> 
> close():
> 	while (!test_and_set(SHED))
> 		flush();
> 
> 	lock(sk);
> 	...
> 
> We just need to move init work, no?

The lock() is already held when entering unhash() side so need to
handle this case as well,

CPU 0 (free)          CPU 1 (wq)

lock(sk)              ctx = tls_get_ctx(sk) <- need to be check null ptr
sk_prot->unhash()
  set_bit()
  cancel_work()
  ...
  kfree(ctx)
unlock(sk)

but using cancel and doing an unlikely(!ctx) check should be
sufficient to handle wq. What I'm not sure how to solve now is
in patch 2 of this series unhash is still calling strp_done
with the sock lock. Maybe we need to do a deferred release
like sockmap side?

Trying to drop the lock and then grabbing it again doesn't
seem right to me seems based on comment in tcp_abort we
could potentially "race with userspace socket closes such
as tcp_close". iirc I think one of the tls splats from syzbot
looked something like this may have happened.

For now I'm considering adding a strp_cancel() op. Seeing
we are closing() the socket and tearkng down we can probably
be OK with throwing out strp results.

> 
> FWIW I never tested his async crypto stuff, I wonder if there is a way
> to convince normal CPU crypto to pretend to be async?