[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190923172811.1f620803@cakuba.netronome.com>
Date: Mon, 23 Sep 2019 17:28:11 -0700
From: Jakub Kicinski <jakub.kicinski@...ronome.com>
To: Pooja Trivedi <poojatrivedi@...il.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net, daniel@...earbox.net,
john.fastabend@...il.com, davejwatson@...com, aviadye@...lanox.com,
borisp@...lanox.com, Pooja Trivedi <pooja.trivedi@...ckpath.com>,
Mallesham Jatharakonda <mallesh537@...il.com>
Subject: Re: [PATCH V2 net 1/1] net/tls(TLS_SW): Fix list_del double free
caused by a race condition in tls_tx_records
On Sat, 21 Sep 2019 23:19:20 -0400, Pooja Trivedi wrote:
> On Wed, Sep 18, 2019 at 5:45 PM Jakub Kicinski wrote:
> > On Wed, 18 Sep 2019 17:37:44 -0400, Pooja Trivedi wrote:
> > > Hi Jakub,
> > >
> > > I have explained one potential way for the race to happen in my
> > > original message to the netdev mailing list here:
> > > https://marc.info/?l=linux-netdev&m=156805120229554&w=2
> > >
> > > Here is the part out of there that's relevant to your question:
> > >
> > > -----------------------------------------
> > >
> > > One potential way for race condition to appear:
> > >
> > > When under tcp memory pressure, Thread 1 takes the following code path:
> > > do_sendfile ---> ... ---> .... ---> tls_sw_sendpage --->
> > > tls_sw_do_sendpage ---> tls_tx_records ---> tls_push_sg --->
> > > do_tcp_sendpages ---> sk_stream_wait_memory ---> sk_wait_event
> >
> > Ugh, so do_tcp_sendpages() can also release the lock :/
> >
> > Since the problem occurs in tls_sw_do_sendpage() and
> > tls_sw_do_sendmsg() as well, should we perhaps fix it at that level?
>
> That won't do because tls_tx_records also gets called when completion
> callbacks schedule delayed work. That was the code path that caused
> the crash for my test. Cavium's nitrox crypto offload driver calling
> tls_encrypt_done, which calls schedule_delayed_work. Delayed work that
> was scheduled would then be processed by tx_work_handler.
> Notice in my previous reply,
> "Thread 2 code path:
> tx_work_handler ---> tls_tx_records"
>
> "Thread 2 code path:
> tx_work_handler ---> tls_tx_records"
Right, the work handler would obviously also have to obey the exclusion
mechanism of choice.
Having said that this really does feel like we are trying to lock code,
not data here :(
Powered by blists - more mailing lists