netdev - Re: [PATCH net] net/tls(TLS_SW): Fix integrity issue with non-blocking sw KTLS request

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOrEds=e62EnDiB5b-5Btukp83OASVaVgBG28GkxSBw1F8sLSQ@mail.gmail.com>
Date:   Wed, 20 May 2020 15:56:56 -0400
From:   Pooja Trivedi <poojatrivedi@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     borisp@...lanox.com, aviadye@...lanox.com,
        john.fastabend@...il.com, daniel@...earbox.net,
        davem@...emloft.net, vakul.garg@....com, netdev@...r.kernel.org,
        linux-crypto@...r.kernel.org,
        mallesham.jatharkonda@...convergence.com, josh.tway@...ckpath.com,
        Pooja Trivedi <pooja.trivedi@...ckpath.com>
Subject: Re: [PATCH net] net/tls(TLS_SW): Fix integrity issue with
 non-blocking sw KTLS request

On Tue, May 19, 2020 at 5:43 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Tue, 19 May 2020 13:21:56 -0400 Pooja Trivedi wrote:
> > On Mon, May 18, 2020 at 6:50 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > > On Sun, 17 May 2020 16:26:36 +0000 Pooja Trivedi wrote:
> > > > In pure sw ktls(AES-NI), -EAGAIN from tcp layer (do_tcp_sendpages for
> > > > encrypted record) gets treated as error, subtracts the offset, and
> > > > returns to application. Because of this, application sends data from
> > > > subtracted offset, which leads to data integrity issue. Since record is
> > > > already encrypted, ktls module marks it as partially sent and pushes the
> > > > packet to tcp layer in the following iterations (either from bottom half
> > > > or when pushing next chunk). So returning success in case of EAGAIN
> > > > will fix the issue.
> > > >
> > > > Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption")
> > > > Signed-off-by: Pooja Trivedi <pooja.trivedi@...ckpath.com>
> > > > Reviewed-by: Mallesham Jatharkonda <mallesham.jatharkonda@...convergence.com>
> > > > Reviewed-by: Josh Tway <josh.tway@...ckpath.com>
> > >
> > > This looks reasonable, I think. Next time user space calls if no new
> > > buffer space was made available it will get a -EAGAIN, right?
> >
> > Yes, this fix should only affect encrypted record. Plain text calls from
> > user space should be unaffected.
>
> AFAICS if TCP layer is full next call from user space should hit
> sk_stream_wait_memory() immediately and if it has MSG_DONTWAIT set
> exit with EAGAIN. Which I believe to be correct behavior.
>

The flow is tls_sw_sendmsg/tls_sw_do_sendpage --> bpf_exec_tx_verdict -->
tls_push_record --> tls_tx_records --> tls_push_sg --> do_tcp_sendpages

do_tcp_sendpages() sends partial record, 'retry:' label is exercised wherein
do_tcp_sendpages gets called again and returns -EAGAIN.
tls_push_sg sets partially_sent_record/partially_sent_offset and
returns -EAGAIN. -EAGAIN bubbles up to bpf_exec_tx_verdict.
In bpf_exec_tx_verdict, the following code causes 'copied' variable to
get updated to a negative value and returns -EAGAIN.

                err = tls_push_record(sk, flags, record_type);
                if (err && err != -EINPROGRESS) {
                        *copied -= sk_msg_free(sk, msg);
                        tls_free_open_rec(sk);
                }
                return err;

-EAGAIN returned by bpf_exec_tx_verdict causes
tls_sw_sendmsg/tls_sw_do_sendpage to 'continue' in the while loop and
call sk_stream_wait_memory(). sk_stream_wait_memory returns -EAGAIN
also and control reaches the 'send_end:' label. The following return
statement causes a negative 'copied' variable value to be returned to the
user space.

        return copied ? copied : ret;

User space applies this negative value as offset for the next send, causing
part of the record that was already sent to be pushed again.

Hope this clarifies it.


>
> > > Two questions - is there any particular application or use case that
> > > runs into this?
> >
> > We are running into this case when we hit our kTLS-enabled homegrown
> > webserver with a 'pipeline' test tool, also homegrown. The issue basically
> > happens whenever the send buffer on the server gets full and TCP layer
> > returns EAGAIN when attempting to TX the encrypted record. In fact, we
> > are also able to reproduce the issue by using a simple wget with a large
> > file, if/when sndbuf fills up.
>
> I see just a coincidence, then, no worries.
>
> > > Seems a bit surprising to see a patch from Vadim and
> > > you guys come at the same time.
> >
> > Not familiar with Vadim or her/his patch. Could you please point me to it?
>
> http://patchwork.ozlabs.org/project/netdev/patch/20200517014451.954F05026DE@novek.ru/
>

Ah, looks like Vadim ran into the exact issue!

>
> > > Could you also add test for this bug?
> > > In tools/testing/selftests/net/tls.c
> > >
> >
> > Sure, yes. Let me look into this.
>
> Thanks!