[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5d166d2deacfe_10452ad82c16e5c0a5@john-XPS-13-9370.notmuch>
Date: Fri, 28 Jun 2019 12:40:29 -0700
From: John Fastabend <john.fastabend@...il.com>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>,
John Fastabend <john.fastabend@...il.com>
Cc: daniel@...earbox.io, ast@...nel.org, netdev@...r.kernel.org,
edumazet@...gle.com, bpf@...r.kernel.org
Subject: Re: [PATCH 1/2] tls: remove close callback sock unlock/lock and
flush_sync
Jakub Kicinski wrote:
> On Fri, 28 Jun 2019 07:12:07 -0700, John Fastabend wrote:
> > Yeah seems possible although never seen in my testing. So I'll
> > move the test_bit() inside the lock and do a ctx check to ensure
> > still have the reference.
> >
> > CPU 0 (free) CPU 1 (wq)
> >
> > lock(sk)
> > lock(sk)
> > set_bit()
> > cancel_work()
> > release
> > ctx = tls_get_ctx(sk)
> > unlikely(!ctx) <- we may have free'd
> > test_bit()
> > ...
> > release()
> >
> > or
> >
> > CPU 0 (free) CPU 1 (wq)
> >
> > lock(sk)
> > lock(sk)
> > ctx = tls_get_ctx(sk)
> > unlikely(!ctx)
> > test_bit()
> > ...
> > release()
> > set_bit()
> > cancel_work()
> > release
>
> Hmm... perhaps it's cleanest to stop the work from scheduling before we
> proceed?
>
> close():
> while (!test_and_set(SHED))
> flush();
>
> lock(sk);
> ...
>
> We just need to move init work, no?
The lock() is already held when entering unhash() side so need to
handle this case as well,
CPU 0 (free) CPU 1 (wq)
lock(sk) ctx = tls_get_ctx(sk) <- need to be check null ptr
sk_prot->unhash()
set_bit()
cancel_work()
...
kfree(ctx)
unlock(sk)
but using cancel and doing an unlikely(!ctx) check should be
sufficient to handle wq. What I'm not sure how to solve now is
in patch 2 of this series unhash is still calling strp_done
with the sock lock. Maybe we need to do a deferred release
like sockmap side?
Trying to drop the lock and then grabbing it again doesn't
seem right to me seems based on comment in tcp_abort we
could potentially "race with userspace socket closes such
as tcp_close". iirc I think one of the tls splats from syzbot
looked something like this may have happened.
For now I'm considering adding a strp_cancel() op. Seeing
we are closing() the socket and tearkng down we can probably
be OK with throwing out strp results.
>
> FWIW I never tested his async crypto stuff, I wonder if there is a way
> to convince normal CPU crypto to pretend to be async?
Powered by blists - more mailing lists