lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <538AF2CB.20603@mellanox.com>
Date:	Sun, 1 Jun 2014 12:30:51 +0300
From:	Or Gerlitz <ogerlitz@...lanox.com>
To:	Wei Yang <weiyang@...ux.vnet.ibm.com>, <davem@...emloft.net>
CC:	<netdev@...r.kernel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
	Amir Vadai <amirv@...lanox.com>,
	Jack Morgenstein <jackm@....mellanox.co.il>
Subject: Re: [PATCH 3.14-stable] net/mlx4_core: Preserve pci_dev_data after
 __mlx4_remove_one()

On 01/06/2014 10:38, Wei Yang wrote:
> David, Following are the backport of this patch to 3.14, 3.10, 3.4 and 3.2 stable tree.


Wait,

I recently noticed that on 3.15-rcX if the host is rebooted when the 
mlx4_core driver is loaded in SRIOV mode, we crash like that,
looking on this now, I think there's chance we can relate it to your 
upstream change befdf89 "net/mlx4_core: Preserve pci_dev_data after 
__mlx4_remove_one()"

Or.


[  152.121286] mlx4_core 0000:06:00.0: Received reset from slave:2
[  152.128031] mlx4_core 0000:06:00.0: Have more references for index 
0,no need to modify mac table
[  152.209248] mlx4_core 0000:06:00.0: Received reset from slave:1
[  152.215889] mlx4_core 0000:06:00.0: Have more references for index 
0,no need to modify mac table
[  152.216305] sd 1:0:1:0: [sdd] Synchronizing SCSI cache
[  152.221714] sd 1:0:0:0: [sdc] Synchronizing SCSI cache
[  152.227108] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
[  152.232494] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  152.271991] mlx4_en 0000:06:00.0: removed PHC
[  152.281611] mlx4_core 0000:06:00.0: Have more references for index 
0,no need to modify mac table
[  152.318395] mlx4_core 0000:06:00.0: Disabling SR-IOV
[  152.323513] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000378
[  152.331523] IP: [<ffffffffa01668e0>] __mlx4_remove_one+0x20/0x370 
[mlx4_core]
[  152.338778] PGD 0
[  152.340908] Oops: 0000 [#1] PREEMPT SMP
[  152.345058] Modules linked in: netconsole nfsv3 nfs_acl auth_rpcgss 
oid_registry nfsv4 nfs lockd autofs4 8021q sunrpc cpufreq_ondemand 
bridge stp llc ext4 jbd2 cr
c16 raid0 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan 
vhost tun kvm_intel kvm dm_mod ixgbevf microcode pcspkr joydev i2c_i801 
sg ehci_pci ehci_hcd mlx4
_ib mlx4_en ioatdma ib_sa ib_mad ib_core ib_addr vxlan ipv6 mlx4_core 
ixgbe mdio igb dca ptp pps_core hwmon button ext3 jbd sd_mod ata_piix 
libata scsi_mod uhci_hcd
[  152.392161] CPU: 8 PID: 4557 Comm: reboot Not tainted 3.15.0-rc6+ #149
[  152.398760] Hardware name: Supermicro X8DTU/X8DTU, BIOS 2.1c       
08/03/2012
[  152.405954] task: ffff880331fca490 ti: ffff8800bb1a6000 task.ti: 
ffff8800bb1a6000
[  152.413507] RIP: 0010:[<ffffffffa01668e0>] [<ffffffffa01668e0>] 
__mlx4_remove_one+0x20/0x370 [mlx4_core]
[  152.423220] RSP: 0018:ffff8800bb1a7b98  EFLAGS: 00010286
[  152.428598] RAX: 0000000000000000 RBX: ffff880630a78098 RCX: 
0000000000000000
[  152.435793] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 
ffff880630a78098
[  152.442987] RBP: ffff8800bb1a7bc8 R08: 0000000000000000 R09: 
ffffffff81584556
[  152.450181] R10: ffffea000cc42e18 R11: ffffffff811ab129 R12: 
ffff880630a78000
[  152.457374] R13: ffff880630a78000 R14: 0000000000000000 R15: 
ffff8800bb1a7cc8
[  152.464568] FS:  00007f60f21f6700(0000) GS:ffff88063fc80000(0000) 
knlGS:0000000000000000
[  152.472731] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  152.478538] CR2: 0000000000000378 CR3: 00000000bcdf0000 CR4: 
00000000000007e0
[  152.485734] Stack:
[  152.487805]  0000000000000000 ffff880630a78098 ffff880630a78000 
0000000000000000
[  152.495534]  0000000000000000 ffff8800bb1a7cc8 ffff8800bb1a7bf8 
ffffffffa0166c91
[  152.503264]  ffff880630a78098 ffff880630a78098 ffffffffa0181640 
ffff880630a78000
[  152.510978] Call Trace:
[  152.513492]  [<ffffffffa0166c91>] mlx4_remove_one+0x31/0x60 [mlx4_core]
[  152.520172]  [<ffffffff81231da1>] pci_device_remove+0x41/0xc0
[  152.525987]  [<ffffffff812ef30a>] __device_release_driver+0x7a/0xe0
[  152.532320]  [<ffffffff812ef468>] device_release_driver+0x28/0x40
[  152.538475]  [<ffffffff8122bd6c>] pci_stop_bus_device+0x9c/0xb0
[  152.544461]  [<ffffffff8122bfa1>] 
pci_stop_and_remove_bus_device+0x11/0x20
[  152.551399]  [<ffffffff8124576d>] virtfn_remove.clone.0+0xdd/0x140
[  152.557645]  [<ffffffff812ed30e>] ? dev_warn+0x4e/0x50
[  152.562841]  [<ffffffff8124582f>] pci_disable_sriov+0x5f/0xf0
[  152.568655]  [<ffffffffa0166bf4>] __mlx4_remove_one+0x334/0x370 
[mlx4_core]
[  152.575685]  [<ffffffffa0166c91>] mlx4_remove_one+0x31/0x60 [mlx4_core]
[  152.582364]  [<ffffffff81231b1c>] pci_device_shutdown+0x3c/0x90
[  152.588343]  [<ffffffff812ed105>] device_shutdown+0x15/0x180
[  152.594065]  [<ffffffff81085891>] kernel_restart_prepare+0x31/0x40
[  152.600304]  [<ffffffff81085a51>] kernel_restart+0x11/0x60
[  152.605851]  [<ffffffff81085c60>] SyS_reboot+0x1b0/0x200
[  152.611226]  [<ffffffff81159c83>] ? mntput_no_expire+0x33/0x180
[  152.617204]  [<ffffffff81159dec>] ? mntput+0x1c/0x30
[  152.622232]  [<ffffffff8113c804>] ? __fput+0x144/0x1f0
[  152.627432]  [<ffffffff8113c949>] ? ____fput+0x9/0x10
[  152.632545]  [<ffffffff8107d07c>] ? task_work_run+0x8c/0xe0
[  152.638180]  [<ffffffff81002a64>] ? do_notify_resume+0x74/0x80
[  152.644075]  [<ffffffff810cd6f6>] ? __audit_syscall_exit+0x236/0x2e0
[  152.650490]  [<ffffffff81476d72>] ? int_signal+0x12/0x17
[  152.655869]  [<ffffffff81476ab9>] system_call_fastpath+0x16/0x1b
[  152.661935] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 
41 56 41 55 49 89 fd 48 8d bf 98 00 00 00 41 54 53 48 83 ec 08 e8 60 86 
18 e1 <8b> 90 78 03 00 00 48 89 c3 85 d2 0f 85 30 02 00 00 f6 40 08 04
[  152.684806] RIP  [<ffffffffa01668e0>] __mlx4_remove_one+0x20/0x370 
[mlx4_core]
[  152.692170]  RSP <ffff8800bb1a7b98>
[  152.695723] CR2: 0000000000000378
[  152.699163] ---[ end trace 9c36c3b85b765771 ]---



>
> On 3.14, only this patch is backported.
> On 3.10, a previous related one "pass pci_device_id.driver_data to
>           __mlx4_init_one during reset" is backported too.
> On 3.4,  "pass pci_device_id.driver_data to __mlx4_init_one during reset" is
>           not backported, since the slot_reset handler is not presented.
> 	 While another one, "Stash PCI ID driver_data in mlx4_priv structure"
> 	 is backported to make this patch valid on this version.
> On 3.2,  The same as 3.4.
>
> All version are compiled successfully. 3.14 and 3.10 are verified, while 3.4
> and 3.2 are not.
>
> I am not sure how to make them all in one big patch set, so send them
> seperatedly. Each version is contained in one patch set. If there is a better
> way for you to merge them, please let me know.
>
> At last, Happy Children's Day for all :-)
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ