[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5d24b55e8b868_3b162ae67af425b43e@john-XPS-13-9370.notmuch>
Date: Tue, 09 Jul 2019 08:40:14 -0700
From: John Fastabend <john.fastabend@...il.com>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>,
John Fastabend <john.fastabend@...il.com>
Cc: ast@...nel.org, daniel@...earbox.net, netdev@...r.kernel.org,
edumazet@...gle.com, bpf@...r.kernel.org
Subject: Re: [bpf PATCH v2 0/6] bpf: sockmap/tls fixes
Jakub Kicinski wrote:
> On Mon, 08 Jul 2019 19:13:29 +0000, John Fastabend wrote:
> > Resolve a series of splats discovered by syzbot and an unhash
> > TLS issue noted by Eric Dumazet.
> >
> > The main issues revolved around interaction between TLS and
> > sockmap tear down. TLS and sockmap could both reset sk->prot
> > ops creating a condition where a close or unhash op could be
> > called forever. A rare race condition resulting from a missing
> > rcu sync operation was causing a use after free. Then on the
> > TLS side dropping the sock lock and re-acquiring it during the
> > close op could hang. Finally, sockmap must be deployed before
> > tls for current stack assumptions to be met. This is enforced
> > now. A feature series can enable it.
> >
> > To fix this first refactor TLS code so the lock is held for the
> > entire teardown operation. Then add an unhash callback to ensure
> > TLS can not transition from ESTABLISHED to LISTEN state. This
> > transition is a similar bug to the one found and fixed previously
> > in sockmap. Then apply three fixes to sockmap to fix up races
> > on tear down around map free and close. Finally, if sockmap
> > is destroyed before TLS we add a new ULP op update to inform
> > the TLS stack it should not call sockmap ops. This last one
> > appears to be the most commonly found issue from syzbot.
>
> Looks like strparser is not done'd for offload?
Right so if rx_conf != TLS_SW then the hardware needs to do
the strparser functionality.
>
> About patch 6 - I was recently wondering about the "impossible" syzbot
> report where context is not freed and my conclusion was that there
> can be someone sitting at lock_sock() in tcp_close() already by the
> time we start installing the ULP, so TLS's close will never get called.
> The entire replacing of callbacks business is really shaky :(
Well replacing callbacks is the ULP model. The race we are fixing in
patch 6 is sockmap being free'd which removes psock and resets proto ops
with tcp_close() path.
I don't think there is another race like you describe because tcp_set_ulp
is called from do_tcp_setsockopt which holds the lock and tcp state is
checked to ensure its ESTABLISHED. A closing sock wont be in ESTABLISHED
state so any setup will be aborted. Before patch 1 though I definately
saw this race because we dropped the lock mid-close.
With this series I've been running those syzbot programs over night
without issue on 4 cores. Also selftests pass in ./net/tls and ./bpf/
so I think its stable and resolves many of the issues syzbot has been
stomping around.
>
> Perhaps I'm rumbling, I will take a close look after I get some sleep :)
Yes please do ;)
Powered by blists - more mailing lists