lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 10 Sep 2014 10:42:41 +0300
From:	Or Gerlitz <ogerlitz@...lanox.com>
To:	Chuck Lever <chuck.lever@...cle.com>,
	Govindarajulu Varadarajan <_govind@....com>,
	"David S. Miller" <davem@...emloft.net>
CC:	<netdev@...r.kernel.org>,
	LKML Kernel <linux-kernel@...r.kernel.org>,
	linux-rdma <linux-rdma@...r.kernel.org>,
	Bart Van Assche <bvanassche@....org>,
	Saeed Mahameed <saeedm@...lanox.com>,
	Tal Alon <talal@...lanox.com>,
	Yevgeny Petrilin <yevgenyp@...lanox.com>,
	Roland Dreier <roland@...nel.org>
Subject: Re: 3.17-rc1 oops during network interface configuration


On 9/9/2014 10:30 PM, Chuck Lever wrote:
> This crash happens when booting v3.17-rcN on any of my IB-enabled
> systems. I have both ConnectX-2 and mthca systems, all are affected.
>
> I bisected this to:
>
> commit e0f31d8498676fda36289603a054d0d490aa2679
> Author:     Govindarajulu Varadarajan <_govind@....com>
> AuthorDate: Mon Jun 23 16:07:58 2014 +0530
> Commit:     David S. Miller <davem@...emloft.net>
> CommitDate: Mon Jun 23 14:32:19 2014 -0700
>
>      flow_keys: Record IP layer protocol in skb_flow_dissect()
>
>      skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
>      give any information about IPv4 or IPv6.
>      This patch adds new member, n_proto, to struct flow_keys. Which records the
>      IP layer type. i.e IPv4 or IPv6.
>      This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.
>      Adding new member to flow_keys increases the struct size by around 4 bytes.
>      This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
>      qdisc_cb_private_validate()
>      So increase data size by 4
>
>      Signed-off-by: Govindarajulu Varadarajan <_govind@....com>
>      Signed-off-by: David S. Miller <davem@...emloft.net>
>
>
> This commit includes a hunk that increases the size of struct qdisc_skb_cb
> by at least 4 bytes:
>
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index 624f985..a3cfb8e 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>>          unsigned int            pkt_len;
>>          u16                     slave_dev_queue_mapping;
>>          u16                     _pad;
>> -       unsigned char           data[20];
>> +       unsigned char           data[24];
>>   };
>>   
>>   static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
>
>
> IPoIB defines the following structure in drivers/infiniband/ulp/ipoib/ipoib.h:
>
>> struct ipoib_cb {
>>          struct qdisc_skb_cb     qdisc_cb;
>>          u8                      hwaddr[INFINIBAND_ALEN];
>> };
> IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes.
> After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes
> 52 bytes.
>
> Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst
> field, which contains a pointer. By the time we get into
> __dev_queue_xmit() and try to use the result of skb_dst(), that pointer
> is garbage, and we oops.
>
> Obviously, cb[] could be increased to 56 bytes to accommodate struct
> ipoib_cb. I tried this, and it is effective in preventing the oops on
> one of my systems.
>
> But I suspect there is an historical reason I’m not aware of that it
> has remained 48 bytes for years.

Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 
commit 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb 
to stash LL addresses" we are using the skb->cb field to enable proper 
work under GRO and avoid another historical quirk we had there... so I 
think we can definetly consider commit e0f31d849 to introduce a severe 
regression... Govindarajulu, Dave - what's your thinking here? any quick 
idea on how to fix?

Also, I was thinking we have the mechanics in the kernel, e.g commit 
a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to 
catch such over-flows?

Or.

>>> BUG: unable to handle kernel paging request at ffff88090000007e
>>> IP: __dev_queue_xmit+0x519
>>> Call Trace:
>>> ? __dev_queue_xmit+0x49
>>> dev_queue_xmit+0x10
>>> neigh_connected_output
>>> ? ip_finish_output
>>> ip_finish_output
>>> ? ip_finish_output
>>> ? netif_rx_ni
>>> ip_mc_output
>>> ip_local_out_sk
>>> ip_send_skb
>>> udp_send_skb
>>> udp_sendmsg
>>> ? ip_reply_glue_bits
>>> ? __lock_is_held
>>> inet_sendmsg
>>> ? inet_sendmsg
>>> sock_sendmsg
>>> ? might_fault
>>> ? might_fault
>>> ? move_addr_to_kernel.part.38
>>> SYSC_sendto
>>> ? sysret_check
>>> ? trace_hardirqs_on_caller
>>> ? trace_hardirqs_on_thunk
>>> SyS_sendto
>>> system_call_fastpath
>>>
>>> Kernel panic - not syncing: Fatal exception in interrupt
>>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
>>> drm_kms_helper: panic occurred, switching back to text console
>>>
>>> A screenshot of this kernel oops can be found here:
>>> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
>>>
>>> gdb translates the crash address into the following (not sure this makes sense
>>> since offset 0x519 is past the end of __dev_queue_xmit()):
>>>
>>> (gdb) list *(__dev_queue_xmit+0x519)
>>> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
>>> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
>>> 5163    {
>>> 5164            struct netdev_adjacent *iter;
>>> 5165
>>> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
>>> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
>>> 5168                                              &iter->dev->adj_list.lower);
>>> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
>>> 5170                                              &iter->dev->adj_list.lower);
>>> 5171            }
>>>
>>> And the address __dev_queue_xmit+0x49 is translated by gdb into:
>>>
>>> (gdb) list *(__dev_queue_xmit+0x49)
>>> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
>>> 70       * The various preempt_count add/sub methods
>>> 71       */
>>> 72
>>> 73      static __always_inline void __preempt_count_add(int val)
>>> 74      {
>>> 75              raw_cpu_add_4(__preempt_count, val);
>>> 76      }
>>> 77
>>> 78      static __always_inline void __preempt_count_sub(int val)
>>> 79      {

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ