[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5cdb92cc4bed5_12292afa9bb1c5b8d5@john-XPS-13-9360.notmuch>
Date: Tue, 14 May 2019 21:17:16 -0700
From: John Fastabend <john.fastabend@...il.com>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc: ast@...nel.org, daniel@...earbox.net, netdev@...r.kernel.org,
bpf@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>
Subject: Re: [bpf PATCH v4 1/4] bpf: tls, implement unhash to avoid transition
out of ESTABLISHED
Jakub Kicinski wrote:
> On Tue, 14 May 2019 15:34:55 -0700, John Fastabend wrote:
> > John Fastabend wrote:
> > > Jakub Kicinski wrote:
> > > > On Thu, 09 May 2019 21:57:49 -0700, John Fastabend wrote:
> > > > > @@ -2042,12 +2060,14 @@ void tls_sw_free_resources_tx(struct sock *sk)
> > > > > if (atomic_read(&ctx->encrypt_pending))
> > > > > crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
> > > > >
> > > > > - release_sock(sk);
> > > > > + if (locked)
> > > > > + release_sock(sk);
> > > > > cancel_delayed_work_sync(&ctx->tx_work.work);
> > > >
> > > > So in the splat I got (on a slightly hacked up kernel) it seemed like
> > > > unhash may be called in atomic context:
> > > >
> > > > [ 783.232150] tls_sk_proto_unhash+0x72/0x110 [tls]
> > > > [ 783.237497] tcp_set_state+0x484/0x640
> > > > [ 783.241776] ? __sk_mem_reduce_allocated+0x72/0x4a0
> > > > [ 783.247317] ? tcp_recv_timestamp+0x5c0/0x5c0
> > > > [ 783.252265] ? tcp_write_queue_purge+0xa6a/0x1180
> > > > [ 783.257614] tcp_done+0xac/0x260
> > > > [ 783.261309] tcp_reset+0xbe/0x350
> > > > [ 783.265101] tcp_validate_incoming+0xd9d/0x1530
> > > >
> > > > I may have been unclear off-list, I only tested the patch no longer
> > > > crashes the offload :(
> > > >
> > >
> > > Yep, I misread and thought it was resolved here as well. OK I'll dig into
> > > it. I'm not seeing it from selftests but I guess that means we are missing
> > > a testcase. :( yet another version I guess.
> > >
> >
> > Seems we need to call release_sock in the unhash case as well. Will
> > send a new patch shortly.
>
> My reading of the stack trace was that unhash gets called from
> tcp_reset(), IOW from soft IRQ, so we can't cancel_delayed_work_sync()
> in tls_sw_free_resources_tx(), no?
Well the tcp_close() path has the lock held and can also call unhash(). Anyways
this dropping the sock lock in the middle of the block seems a bit suspect
to me anyways. I think we can defer the free until after sock is released this
is how it was solved on sockmap side.
Powered by blists - more mailing lists