lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 9 Sep 2014 15:30:51 -0400
From:	Chuck Lever <chuck.lever@...cle.com>
To:	netdev@...r.kernel.org, LKML Kernel <linux-kernel@...r.kernel.org>
Cc:	linux-rdma <linux-rdma@...r.kernel.org>,
	Bart Van Assche <bvanassche@....org>,
	Or Gerlitz <ogerlitz@...lanox.com>,
	Saeed Mahameed <saeedm@...lanox.com>,
	Tal Alon <talal@...lanox.com>,
	Yevgeny Petrilin <yevgenyp@...lanox.com>, _govind@....com
Subject: Re: 3.17-rc1 oops during network interface configuration


On Aug 20, 2014, at 6:31 AM, Or Gerlitz <ogerlitz@...lanox.com> wrote:

> On 18/08/2014 15:18, Bart Van Assche wrote:
>> Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ? The
>> following call trace is triggered during boot on a system on which kernel
>> 3.16 runs fine:
> 
> Yep, I see it on my systems too.
> 
> I narrowed this down a bit to happen only when the port link type (these nodes have ConnectX) is IB and IPoIB gets to load.
> 
> I reverted (below) all the IPoIB changes since 3.16 (except for the trivial commit c835a67) and the crash still exists.
> 
> I guess this needs to go through systematic bisection.

This crash happens when booting v3.17-rcN on any of my IB-enabled
systems. I have both ConnectX-2 and mthca systems, all are affected.

I bisected this to:

commit e0f31d8498676fda36289603a054d0d490aa2679
Author:     Govindarajulu Varadarajan <_govind@....com>
AuthorDate: Mon Jun 23 16:07:58 2014 +0530
Commit:     David S. Miller <davem@...emloft.net>
CommitDate: Mon Jun 23 14:32:19 2014 -0700

    flow_keys: Record IP layer protocol in skb_flow_dissect()

    skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
    give any information about IPv4 or IPv6.
    This patch adds new member, n_proto, to struct flow_keys. Which records the
    IP layer type. i.e IPv4 or IPv6.
    This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.
    Adding new member to flow_keys increases the struct size by around 4 bytes.
    This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
    qdisc_cb_private_validate()
    So increase data size by 4

    Signed-off-by: Govindarajulu Varadarajan <_govind@....com>
    Signed-off-by: David S. Miller <davem@...emloft.net>


This commit includes a hunk that increases the size of struct qdisc_skb_cb
by at least 4 bytes:

> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 624f985..a3cfb8e 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>         unsigned int            pkt_len;
>         u16                     slave_dev_queue_mapping;
>         u16                     _pad;
> -       unsigned char           data[20];
> +       unsigned char           data[24];
>  };
>  
>  static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)



IPoIB defines the following structure in drivers/infiniband/ulp/ipoib/ipoib.h:

> struct ipoib_cb {
>         struct qdisc_skb_cb     qdisc_cb;
>         u8                      hwaddr[INFINIBAND_ALEN];
> };

IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes.
After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes
52 bytes.

Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst
field, which contains a pointer. By the time we get into
__dev_queue_xmit() and try to use the result of skb_dst(), that pointer
is garbage, and we oops.

Obviously, cb[] could be increased to 56 bytes to accommodate struct
ipoib_cb. I tried this, and it is effective in preventing the oops on
one of my systems.

But I suspect there is an historical reason I’m not aware of that it
has remained 48 bytes for years.


> Or.
> 
>> net.git]# git log --oneline --no-merges v3.16.. drivers/infiniband/ulp/ipoib/
>> 8a118a4 Revert "IB/ipoib: Use P_Key change event instead of P_Key polling mechanism"
>> 90e6f39 Revert "IB/ipoib: Avoid flushing the workqueue from worker context"
>> 030ade7 Revert "IB/ipoib: Avoid multicast join attempts with invalid P_key"
>> 97ba2ff Revert "IPoIB: Remove unnecessary test for NULL before debugfs_remove()"
>> e42fa20 IPoIB: Remove unnecessary test for NULL before debugfs_remove()
>> dd57c93 IB/ipoib: Avoid multicast join attempts with invalid P_key
>> 4eae374 IB/ipoib: Avoid flushing the workqueue from worker context
>> db84f88 IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
>> c835a67 net: set name_assign_type in alloc_netdev()
> 
> 
>> BUG: unable to handle kernel paging request at ffff88090000007e
>> IP: __dev_queue_xmit+0x519
>> Call Trace:
>> ? __dev_queue_xmit+0x49
>> dev_queue_xmit+0x10
>> neigh_connected_output
>> ? ip_finish_output
>> ip_finish_output
>> ? ip_finish_output
>> ? netif_rx_ni
>> ip_mc_output
>> ip_local_out_sk
>> ip_send_skb
>> udp_send_skb
>> udp_sendmsg
>> ? ip_reply_glue_bits
>> ? __lock_is_held
>> inet_sendmsg
>> ? inet_sendmsg
>> sock_sendmsg
>> ? might_fault
>> ? might_fault
>> ? move_addr_to_kernel.part.38
>> SYSC_sendto
>> ? sysret_check
>> ? trace_hardirqs_on_caller
>> ? trace_hardirqs_on_thunk
>> SyS_sendto
>> system_call_fastpath
>> 
>> Kernel panic - not syncing: Fatal exception in interrupt
>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
>> drm_kms_helper: panic occurred, switching back to text console
>> 
>> A screenshot of this kernel oops can be found here:
>> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
>> 
>> gdb translates the crash address into the following (not sure this makes sense
>> since offset 0x519 is past the end of __dev_queue_xmit()):
>> 
>> (gdb) list *(__dev_queue_xmit+0x519)
>> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
>> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
>> 5163    {
>> 5164            struct netdev_adjacent *iter;
>> 5165
>> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
>> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
>> 5168                                              &iter->dev->adj_list.lower);
>> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
>> 5170                                              &iter->dev->adj_list.lower);
>> 5171            }
>> 
>> And the address __dev_queue_xmit+0x49 is translated by gdb into:
>> 
>> (gdb) list *(__dev_queue_xmit+0x49)
>> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
>> 70       * The various preempt_count add/sub methods
>> 71       */
>> 72
>> 73      static __always_inline void __preempt_count_add(int val)
>> 74      {
>> 75              raw_cpu_add_4(__preempt_count, val);
>> 76      }
>> 77
>> 78      static __always_inline void __preempt_count_sub(int val)
>> 79      {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ