[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXE_PjQT6+A9a0Y=ZfbOr_H+umYSqHuRrM6AT_gFJxxP1w@mail.gmail.com>
Date: Fri, 30 Jun 2023 17:16:06 +0200
From: Ard Biesheuvel <ardb@...nel.org>
To: Alexander Potapenko <glider@...gle.com>
Cc: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>, Boris Pismenny <borisp@...dia.com>,
John Fastabend <john.fastabend@...il.com>, Jakub Kicinski <kuba@...nel.org>, herbert@...dor.apana.org.au,
linux-crypto@...r.kernel.org, syzkaller-bugs@...glegroups.com,
syzbot <syzbot+828dfc12440b4f6f305d@...kaller.appspotmail.com>,
Eric Biggers <ebiggers@...nel.org>, Aviad Yehezkel <aviadye@...dia.com>,
Daniel Borkmann <daniel@...earbox.net>, netdev@...r.kernel.org,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH] net: tls: enable __GFP_ZERO upon tls_init()
On Fri, 30 Jun 2023 at 13:55, Alexander Potapenko <glider@...gle.com> wrote:
>
> On Fri, Jun 30, 2023 at 1:49 PM Ard Biesheuvel <ardb@...nel.org> wrote:
> >
> > On Fri, 30 Jun 2023 at 13:38, Alexander Potapenko <glider@...gle.com> wrote:
> > >
> > > On Fri, Jun 30, 2023 at 12:18 PM Ard Biesheuvel <ardb@...nel.org> wrote:
> > > >
> > > > On Fri, 30 Jun 2023 at 12:11, Alexander Potapenko <glider@...gle.com> wrote:
> > > > >
> > > > > On Fri, Jun 30, 2023 at 12:02 PM Ard Biesheuvel <ardb@...nel.org> wrote:
> > > > > >
> > > > > > On Fri, 30 Jun 2023 at 11:53, Tetsuo Handa
> > > > > > <penguin-kernel@...ove.sakura.ne.jp> wrote:
> > > > > > >
> > > > > > > On 2023/06/30 18:36, Ard Biesheuvel wrote:
> > > > > > > > Why are you sending this now?
> > > > > > >
> > > > > > > Just because this is currently top crasher and I can reproduce locally.
> > > > > > >
> > > > > > > > Do you have a reproducer for this issue?
> > > > > > >
> > > > > > > Yes. https://syzkaller.appspot.com/text?tag=ReproC&x=12931621900000 works.
> > > > > > >
> > > > > >
> > > > > > Could you please share your kernel config and the resulting kernel log
> > > > > > when running the reproducer? I'll try to reproduce locally as well,
> > > > > > and see if I can figure out what is going on in the crypto layer
> > > > >
> > > > > The config together with the repro is available at
> > > > > https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d, see the
> > > > > latest row of the "Crashes" table that contains a C repro.
> > > >
> > > > Could you explain why that bug contains ~50 reports that seem entirely
> > > > unrelated?
> > >
> > > These are some unfortunate effects of syzbot trying to deduplicate
> > > bugs. There's a tradeoff between reporting every single crash
> > > separately and grouping together those that have e.g. the same origin.
> > > Applying this algorithm transitively results in bigger clusters
> > > containing unwanted reports.
> > > We'll look closer.
> > >
> > > > AIUI, this actual issue has not been reproduced since
> > > > 2020??
> > >
> > > Oh, sorry, I misread the table and misinformed you. The topmost row of
> > > the table is indeed the _oldest_ one.
> > > Another manifestation of the bug was on 2023/05/23
> > > (https://syzkaller.appspot.com/text?tag=CrashReport&x=146f66b1280000)
> > >
> >
> > That one has nothing to do with networking, so I don't see how this
> > patch would affect it.
>
> I definitely have to be more attentive.
> You are right that this bug report is also unrelated. Yet it is still
> fine to use the build artifacts corresponding to it (which is what I
> did).
> I'll investigate why so many reports got clustered into this one.
>
>
>
> > OK, thanks for the instructions.
> >
> > Out of curiosity - does the stack trace you cut off here include the
> > BPF routine mentioned in the report?
>
> It does:
>
> [ 151.522472][ T5865] =====================================================
> [ 151.523843][ T5865] BUG: KMSAN: uninit-value in aes_encrypt+0x15cc/0x1db0
> [ 151.525120][ T5865] aes_encrypt+0x15cc/0x1db0
> [ 151.526113][ T5865] aesti_encrypt+0x7d/0xf0
> [ 151.527057][ T5865] crypto_cipher_encrypt_one+0x112/0x200
> [ 151.528224][ T5865] crypto_cbcmac_digest_update+0x301/0x4b0
> [ 151.529459][ T5865] shash_ahash_finup+0x66e/0xc00
> [ 151.530541][ T5865] shash_async_finup+0x7f/0xc0
> [ 151.531542][ T5865] crypto_ahash_finup+0x1b8/0x3e0
> [ 151.532583][ T5865] crypto_ccm_auth+0x1269/0x1350
> [ 151.533606][ T5865] crypto_ccm_encrypt+0x1c9/0x7a0
> [ 151.534650][ T5865] crypto_aead_encrypt+0xe0/0x150
> [ 151.535695][ T5865] tls_push_record+0x3bf3/0x4ec0
> [ 151.539491][ T5865] bpf_exec_tx_verdict+0x46e/0x21d0
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> [ 151.540597][ T5865] tls_sw_do_sendpage+0x1150/0x1ad0
OK, so after poking around a little bit, I have managed to confirm
that the problem is in the TLS layer, and I am a bit out of my depth
debugging that.
With the debugging code below applied, KMSAN triggers on an
uninit-memory in the input scatterlist provided by the TLS layer into
the CCM code.
[ 148.375852][ T2424] =====================================================
[ 148.377269][ T2424] BUG: KMSAN: uninit-value in tls_push_record+0x2d9f/0x3eb0
[ 148.378623][ T2424] tls_push_record+0x2d9f/0x3eb0
[ 148.379559][ T2424] bpf_exec_tx_verdict+0x5ba/0x2530
[ 148.380534][ T2424] tls_sw_do_sendpage+0x169c/0x1f80
[ 148.381519][ T2424] tls_sw_sendpage+0x247/0x2b0
...
[ 148.411559][ T2424]
[ 148.412108][ T2424] Bytes 0-15 of 16 are uninitialized
[ 148.413379][ T2424] Memory access of size 16 starts at ffff8880157889c7
Note that this is the *input* scatterlist containing the AAD
(additional authenticated data) and the crypto input, and so there is
definitely a bug here that shouldn't be papered over by zero'ing the
allocation.
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -543,6 +543,21 @@ static int tls_do_encryption(struct sock *sk,
list_add_tail((struct list_head *)&rec->list, &ctx->tx_list);
atomic_inc(&ctx->encrypt_pending);
+ {
+ int len = aead_req->assoclen + aead_req->cryptlen;
+ struct sg_mapping_iter miter;
+
+ sg_miter_start(&miter, rec->sg_aead_in,
+ sg_nents(rec->sg_aead_in),
+ SG_MITER_TO_SG | SG_MITER_ATOMIC);
+
+ while (len > 0 && sg_miter_next(&miter)) {
+ kmsan_check_memory(miter.addr, min(len,
(int)miter.length));
+ len -= miter.length;
+ }
+ sg_miter_stop(&miter);
+ }
+
The reason that this cascades all the way down to the AES cipher code
appears to be that sbox substitution involves array indexing, which is
one of the actions KMSAN qualifies as 'use' of uninit data.
Powered by blists - more mailing lists