netdev - Re: [PATCH V2 net 1/1] net/tls(TLS_SW): Fix list_del double free caused by a race condition in tls_tx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOrEds=zEh5R_4G1UuT-Ee3LT-ZiTV=1JNWb_4a=5Mb4coFEVg@mail.gmail.com>
Date:   Tue, 24 Sep 2019 12:48:26 -0400
From:   Pooja Trivedi <poojatrivedi@...il.com>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, daniel@...earbox.net,
        john.fastabend@...il.com, davejwatson@...com, aviadye@...lanox.com,
        borisp@...lanox.com, Pooja Trivedi <pooja.trivedi@...ckpath.com>,
        Mallesham Jatharakonda <mallesh537@...il.com>
Subject: Re: [PATCH V2 net 1/1] net/tls(TLS_SW): Fix list_del double free
 caused by a race condition in tls_tx_records

On Mon, Sep 23, 2019 at 8:28 PM Jakub Kicinski
<jakub.kicinski@...ronome.com> wrote:
>
> On Sat, 21 Sep 2019 23:19:20 -0400, Pooja Trivedi wrote:
> > On Wed, Sep 18, 2019 at 5:45 PM Jakub Kicinski wrote:
> > > On Wed, 18 Sep 2019 17:37:44 -0400, Pooja Trivedi wrote:
> > > > Hi Jakub,
> > > >
> > > > I have explained one potential way for the race to happen in my
> > > > original message to the netdev mailing list here:
> > > > https://marc.info/?l=linux-netdev&m=156805120229554&w=2
> > > >
> > > > Here is the part out of there that's relevant to your question:
> > > >
> > > > -----------------------------------------
> > > >
> > > > One potential way for race condition to appear:
> > > >
> > > > When under tcp memory pressure, Thread 1 takes the following code path:
> > > > do_sendfile ---> ... ---> .... ---> tls_sw_sendpage --->
> > > > tls_sw_do_sendpage ---> tls_tx_records ---> tls_push_sg --->
> > > > do_tcp_sendpages ---> sk_stream_wait_memory ---> sk_wait_event
> > >
> > > Ugh, so do_tcp_sendpages() can also release the lock :/
> > >
> > > Since the problem occurs in tls_sw_do_sendpage() and
> > > tls_sw_do_sendmsg() as well, should we perhaps fix it at that level?
> >
> > That won't do because tls_tx_records also gets called when completion
> > callbacks schedule delayed work. That was the code path that caused
> > the crash for my test. Cavium's nitrox crypto offload driver calling
> > tls_encrypt_done, which calls schedule_delayed_work. Delayed work that
> > was scheduled would then be processed by tx_work_handler.
> > Notice in my previous reply,
> > "Thread 2 code path:
> > tx_work_handler ---> tls_tx_records"
> >
> > "Thread 2 code path:
> > tx_work_handler ---> tls_tx_records"
>
> Right, the work handler would obviously also have to obey the exclusion
> mechanism of choice.
>
> Having said that this really does feel like we are trying to lock code,
> not data here :(

Agree with you and exactly the thought process I went through. So what
are some other options?

1) A lock member inside of ctx to protect tx_list
We are load testing ktls offload with nitrox and the performance was
quite adversely affected by this. This approach can be explored more,
but the original design of using socket lock didn't follow this model
either.
2) Allow tagging of individual record inside of tx_list to indicate if
it has been 'processed'
This approach would likely protect the data without compromising
performance. It will allow Thread 2 to proceed with the TX portion of
tls_tx_records while Thread 1 sleeps waiting for memory. There will
need to be careful cleanup and backtracking after the thread wakes up
to ensure a consistent state of tx_list and record transmission.
The approach has several problems, however -- (a) It could cause
out-of-order record tx (b) If Thread 1 is waiting for memory, Thread 2
most likely will (c) Again, socket lock wasn't designed to follow this
model to begin with

Given that socket lock essentially was working as a code protector --
as an exclusion mechanism to allow only a single writer through
tls_tx_records at a time -- what other clean ways do we have to fix
the race without a significant refactor of the design and code?