lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <C5551D9AAB213A418B7FD5E4A6F30A0702F8FB96@ORSMSX106.amr.corp.intel.com>
Date:	Thu, 16 Feb 2012 19:55:25 +0000
From:	"Rose, Gregory V" <gregory.v.rose@...el.com>
To:	Ben Hutchings <bhutchings@...arflare.com>,
	netdev <netdev@...r.kernel.org>
CC:	Shradha Shah <sshah@...arflare.com>
Subject: RE: SR-IOV setup race between PCI and rtnetlink

> -----Original Message-----
> From: netdev-owner@...r.kernel.org [mailto:netdev-owner@...r.kernel.org]
> On Behalf Of Ben Hutchings
> Sent: Thursday, February 16, 2012 10:23 AM
> To: netdev
> Cc: Shradha Shah
> Subject: SR-IOV setup race between PCI and rtnetlink
> 
> A customer hit this WARNING from rtnetlink:
> 
> ------------[ cut here ]------------
> WARNING: at net/core/rtnetlink.c:1568 rtmsg_ifinfo+0x25a/0x260() (Not
> tainted)
> Hardware name: ProLiant DL380 G7
> Modules linked in: bonding ipv6 dm_mirror dm_region_hash dm_log
> power_meter
> hpilo hpwdt bnx2 onload(U) sfc_char(U) sfc_resource(U) sfc_affinity(U)
> sfc_tune(U) sfc(U) mdio sg microcode serio_raw iTCO_wdt
> iTCO_vendor_support
> i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif hpsa
> radeon
> ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_multipath dm_mod
> [last
> unloaded: scsi_wait_scan]
> Pid: 1927, comm: ifup-eth Not tainted 2.6.32-131.6.1.el6.x86_64 #1
> Call Trace:
> [<ffffffff810670f7>] ? warn_slowpath_common+0x87/0xc0
> [<ffffffff8106714a>] ? warn_slowpath_null+0x1a/0x20
> [<ffffffff8142da9a>] ? rtmsg_ifinfo+0x25a/0x260
> [<ffffffff8108a928>] ? synchronize_sched+0x58/0x60
> [<ffffffff8108a8b0>] ? wakeme_after_rcu+0x0/0x20
> [<ffffffff8141c42e>] ? netdev_set_master+0x6e/0xd0
> [<ffffffffa041191f>] ? bond_enslave+0x22f/0xd00 [bonding]
> [<ffffffff814da624>] ? printk+0x41/0x45
> [<ffffffffa041afd7>] ? bonding_store_slaves+0x2a7/0x420 [bonding]
> [<ffffffff81336900>] ? dev_attr_store+0x20/0x30
> [<ffffffff811e4d95>] ? sysfs_write_file+0xe5/0x170
> [<ffffffff81172748>] ? vfs_write+0xb8/0x1a0
> [<ffffffff810d1ad2>] ? audit_syscall_entry+0x272/0x2a0
> [<ffffffff81173181>] ? sys_write+0x51/0x90
> [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b
> ---[ end trace 7676a5a34ad7b8ee ]---
> ------------[ cut here ]------------
> 
> This was seen with OpenOnload on RHEL 6.1, but I believe the same issue
> exists with the changes I just submitted against net-next.
> 
> The WARNING is produced by:
> 		/* -EMSGSIZE implies BUG in if_nlmsg_size() */
> 		WARN_ON(err == -EMSGSIZE);
> 
> rtnl_vfinfo_size(), rtnl_fill_ifinfo(), etc. use dev_num_vf() to get the
> number of VFs that will be included in the message, i.e. they ask the
> PCI device and not the net device.
> 
> The number of VFs is changed by pci_enable_sriov(), which obviously does
> not acquire the RTNL lock.  Further, it is unsafe for its callers to
> hold the RTNL lock, because it may synchronously bind the new VFs to
> drivers that themselves acquire the RTNL in their probe functions.  So
> the number of VFs may change between the time at which the message size
> is calculated and the time at which it is built.
> 
> Now rtnl_fill_ifinfo() will stop trying to add VF information as soon as
> ndo_get_vf_config() returns an error.  If the driver implementation
> ensures that it returns errors until after pci_enable_sriov() returns
> and it has reacquired the RTNL lock, the message doesn't actually get
> any bigger and the WARNING won't be hit.
> 
> However we really need the VFs to be configurable immediately, so that
> the VF driver can communicate with the PF driver (sfc).  I could add an
> separate flag to keep the RTNL interface disabled while the inter-driver
> interface is enabled, but that doesn't seem like the right thing to do.
> 
> Perhaps there should be a net device op to return the number of VFs,
> which the net driver must then only change while holding the RTNL lock?
> RTNL would then use that instead of dev_num_vf().

Intel drivers aren't subject to this problem at this time because the number of VFs are configured on driver load and don't change until you unload the driver and then reload it with a new max_vfs parameter.  However, if/when I get some patches done that allow the user to set the number of VFs via ethtool instead of just the module parameter that will change and this change would be very helpful in that circumstance also.

Sounds like a good idea to me.

- Greg

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ