netdev - [RFC] xfrm: netdevice unregistration during decryption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <9fb4925ea87677df44c75c435efc329f@codeaurora.org>
Date:	Tue, 08 Mar 2016 19:16:23 -0700
From:	subashab@...eaurora.org
To:	Eric Dumazet <eric.dumazet@...il.com>,
	Steffen Klassert <steffen.klassert@...unet.com>,
	Herbert Xu <herbert@...dor.apana.org.au>
Cc:	netdev@...r.kernel.org
Subject: [RFC] xfrm: netdevice unregistration during decryption

I am observing a crash originating from XFRM framework on a 3.18 ARM64
kernel.

get_rps_cpus tries to dereference the skb->dev fields but it appears 
that
the device is freed from the poison pattern.
The following is the crash call stack -

  55428.227024:   <2> [<ffffffc000af58ec>] get_rps_cpu+0x94/0x2f0
  55428.227027:   <2> [<ffffffc000af5f94>] netif_rx_internal+0x140/0x1cc
  55428.227030:   <2> [<ffffffc000af6094>] netif_rx+0x74/0x94
  55428.227035:   <2> [<ffffffc000bc0b6c>] xfrm_input+0x754/0x7d0
  55428.227038:   <2> [<ffffffc000bc0bf8>] xfrm_input_resume+0x10/0x1c
  55428.227044:   <2> [<ffffffc000ba6eb8>] esp_input_done+0x20/0x30
  55428.227056:   <2> [<ffffffc0000b64c8>] process_one_work+0x244/0x3fc
  55428.227060:   <2> [<ffffffc0000b7324>] worker_thread+0x2f8/0x418
  55428.227064:   <2> [<ffffffc0000bb40c>] kthread+0xe0/0xec

-013|get_rps_cpu(
     |    dev = 0xFFFFFFC08B688000,
     |    skb = 0xFFFFFFC0C76AAC00 -> (
     |      dev = 0xFFFFFFC08B688000 -> (
     |        name = 
"......................................................
     |        name_hlist = (next = 0xAAAAAAAAAAAAAAAA, pprev = 
0xAAAAAAAAAAA

Following are the sequence of events observed -

1. Encrypted packet in receive path from netdevice queued to network 
stack

2. Encrypted packet queued for decryption (asynchronous)

static int esp_input(struct xfrm_state *x, struct sk_buff *skb)
...
          aead_request_set_callback(req, 0, esp_input_done, skb);

3. Netdevice brought down and freed

4. Packet is decrypted and returned through callback in esp_input_done.

5. Packet is queued again for process in network stack using netif_rx.

The device appears to have been freed and as result, the dereference of
skb->dev in get_rps_cpus() leads to an unhandled page fault exception.

Would it make sense here to detect the device going away here using a
netdev notifier callback and free the packets after the asynchronous
callback returns.

Additionally, since the callback is from a worker thread, is it better
to use netif_rx_ni instead of netif_rx

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 85d1d47..f791128 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -351,7 +351,7 @@ resume:

         if (decaps) {
                 skb_dst_drop(skb);
-               netif_rx(skb);
+               netif_rx_ni(skb);