lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 20 Aug 2014 13:31:51 +0300
From:	Or Gerlitz <ogerlitz@...lanox.com>
To:	Bart Van Assche <bvanassche@....org>
CC:	<netdev@...r.kernel.org>, linux-rdma <linux-rdma@...r.kernel.org>,
	"Saeed Mahameed" <saeedm@...lanox.com>,
	Tal Alon <talal@...lanox.com>,
	"Yevgeny Petrilin" <yevgenyp@...lanox.com>
Subject: Re: 3.17-rc1 oops during network interface configuration

On 18/08/2014 15:18, Bart Van Assche wrote:
> Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ? The
> following call trace is triggered during boot on a system on which kernel
> 3.16 runs fine:

Yep, I see it on my systems too.

I narrowed this down a bit to happen only when the port link type (these 
nodes have ConnectX) is IB and IPoIB gets to load.

I reverted (below) all the IPoIB changes since 3.16 (except for the 
trivial commit c835a67) and the crash still exists.

I guess this needs to go through systematic bisection.

Or.

> net.git]# git log --oneline --no-merges v3.16.. drivers/infiniband/ulp/ipoib/
> 8a118a4 Revert "IB/ipoib: Use P_Key change event instead of P_Key polling mechanism"
> 90e6f39 Revert "IB/ipoib: Avoid flushing the workqueue from worker context"
> 030ade7 Revert "IB/ipoib: Avoid multicast join attempts with invalid P_key"
> 97ba2ff Revert "IPoIB: Remove unnecessary test for NULL before debugfs_remove()"
> e42fa20 IPoIB: Remove unnecessary test for NULL before debugfs_remove()
> dd57c93 IB/ipoib: Avoid multicast join attempts with invalid P_key
> 4eae374 IB/ipoib: Avoid flushing the workqueue from worker context
> db84f88 IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
> c835a67 net: set name_assign_type in alloc_netdev()


> BUG: unable to handle kernel paging request at ffff88090000007e
> IP: __dev_queue_xmit+0x519
> Call Trace:
> ? __dev_queue_xmit+0x49
> dev_queue_xmit+0x10
> neigh_connected_output
> ? ip_finish_output
> ip_finish_output
> ? ip_finish_output
> ? netif_rx_ni
> ip_mc_output
> ip_local_out_sk
> ip_send_skb
> udp_send_skb
> udp_sendmsg
> ? ip_reply_glue_bits
> ? __lock_is_held
> inet_sendmsg
> ? inet_sendmsg
> sock_sendmsg
> ? might_fault
> ? might_fault
> ? move_addr_to_kernel.part.38
> SYSC_sendto
> ? sysret_check
> ? trace_hardirqs_on_caller
> ? trace_hardirqs_on_thunk
> SyS_sendto
> system_call_fastpath
>
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
> drm_kms_helper: panic occurred, switching back to text console
>
> A screenshot of this kernel oops can be found here:
> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
>
> gdb translates the crash address into the following (not sure this makes sense
> since offset 0x519 is past the end of __dev_queue_xmit()):
>
> (gdb) list *(__dev_queue_xmit+0x519)
> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
> 5163    {
> 5164            struct netdev_adjacent *iter;
> 5165
> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
> 5168                                              &iter->dev->adj_list.lower);
> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
> 5170                                              &iter->dev->adj_list.lower);
> 5171            }
>
> And the address __dev_queue_xmit+0x49 is translated by gdb into:
>
> (gdb) list *(__dev_queue_xmit+0x49)
> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
> 70       * The various preempt_count add/sub methods
> 71       */
> 72
> 73      static __always_inline void __preempt_count_add(int val)
> 74      {
> 75              raw_cpu_add_4(__preempt_count, val);
> 76      }
> 77
> 78      static __always_inline void __preempt_count_sub(int val)
> 79      {

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists