netdev - mlx5 bug in error path of mlx5e_open

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161101154402.38eb730c@redhat.com>
Date:   Tue, 1 Nov 2016 15:44:02 +0100
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Saeed Mahameed <saeedm@...lanox.com>,
        Tariq Toukan <tariqt@...lanox.com>,
        Tariq Toukan <ttoukan.linux@...il.com>,
        Eran Ben Elisha <eranbe@...lanox.com>
Cc:     brouer@...hat.com,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: mlx5 bug in error path of mlx5e_open_channel()


In driver mlx5 function mlx5e_open_channel() does not handle error
path correctly. (Tested by letting mlx5e_create_rq fail with -ENOMEM,
propagates to mlx5e_open_rq)

This first seemed related to commit b5503b994ed5 ("net/mlx5e: XDP TX
forwarding support").  As a failed call of mlx5e_open_rq() always
calls mlx5e_close_sq(&c->xdp_sq) on "xdp_sq" even-though a XDP program
is not attached.

Fixing this like:

@@ -1504,24 +1533,38 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 
        c->xdp = !!priv->xdp_prog;
        err = mlx5e_open_rq(c, &cparam->rq, &c->rq);
-       if (err)
-               goto err_close_xdp_sq;
+       if (err) {
+               if (c->xdp)
+                       goto err_close_xdp_sq;
+               else
+                       goto err_close_sqs;
+       }

The fix does remove one problem, but the error path still cause the
kernel to crash.  This time it seems related to correct disabling of
NAPI polling before disabling the queues.

Now with another error:

 XXX: call mlx5e_close_sqs(c)
 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: [<          (null)>]           (null)
 PGD 401e00067
 PUD 40746e067 PMD 0
 Oops: 0010 [#1] PREEMPT SMP
 Modules linked in: mlx5_core coretemp kvm_intel kvm irqbypass intel_cstate mxm_wmi i2c_i801 i2c_smbus]
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc3-page_pool04+ #124
 Hardware name: To Be Filled By O.E.M./Z97 Extreme4, BIOS P2.10 05/12/2015
 task: ffffffff81c0c4c0 task.stack: ffffffff81c00000
 RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
 RSP: 0018:ffff88041fa03e70  EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffff880401ecc000 RCX: 0000000000000005
 RDX: 0000000000000000 RSI: ffff880401c38000 RDI: ffff880401ecc000
 RBP: ffff88041fa03e88 R08: 0000000000000001 R09: ffff8803ea6a7230
 R10: 0000000000000000 R11: 0000000000000040 R12: ffff880401c38000
 R13: ffff880401ecf148 R14: 0000000000000040 R15: ffff880401ecc000
 FS:  0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000040c468000 CR4: 00000000001406f0
 Stack:
  ffffffffa02e62e0 0000000000000000 0000000000000001 ffff88041fa03ed0
  ffffffffa02e84c2 0000ffff00000000 ffffffff00000040 ffff880401ecf148
  0000000000000040 0000000000000000 000000000000012c 0000000000000000
 Call Trace:
  <IRQ> [  428.032595]  [<ffffffffa02e62e0>] ? mlx5e_post_rx_wqes+0x80/0xc0 [mlx5_core]
  [<ffffffffa02e84c2>] mlx5e_napi_poll+0xf2/0x530 [mlx5_core]
  [<ffffffff8160e50c>] net_rx_action+0x1fc/0x350
  [<ffffffff8172aff8>] __do_softirq+0xc8/0x2c6
  [<ffffffff8106728e>] irq_exit+0xbe/0xd0
  [<ffffffff8172ad44>] do_IRQ+0x54/0xd0
  [<ffffffff8172937f>] common_interrupt+0x7f/0x7f
  <EOI> [  428.075157]  [<ffffffff817285d0>] ? _raw_spin_unlock_irq+0x10/0x20
  [<ffffffff81086db8>] ? finish_task_switch+0x78/0x200
  [<ffffffff81722dfa>] __schedule+0x17a/0x670
  [<ffffffff8172332d>] schedule+0x3d/0x90
  [<ffffffff817236a5>] schedule_preempt_disabled+0x15/0x20
  [<ffffffff810a560c>] cpu_startup_entry+0x12c/0x1c0
  [<ffffffff8171c274>] rest_init+0x84/0x90
  [<ffffffff81d95f14>] start_kernel+0x3fe/0x40b
  [<ffffffff81d9528f>] x86_64_start_reservations+0x2a/0x2c
  [<ffffffff81d953f9>] x86_64_start_kernel+0x168/0x176
 Code:  Bad RIP value.
 RIP  [<          (null)>]           (null)
  RSP <ffff88041fa03e70>
 CR2: 0000000000000000
 ---[ end trace a871278f0d0523ac ]---

Could you please look at fixing your driver?


Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer