lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <151954917.20121011120004@eikelenboom.it>
Date:	Thu, 11 Oct 2012 12:00:04 +0200
From:	Sander Eikelenboom <linux@...elenboom.it>
To:	Ian Campbell <Ian.Campbell@...rix.com>
CC:	xen-devel <xen-devel@...ts.xen.org>,
	Konrad Rzeszutek Wilk <konrad@...nel.org>,
	Eric Dumazet <eric.dumazet@...il.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Eric Dumazet <edumazet@...gle.com>
Subject: Re: [Xen-devel] compound skb frag pages appearing in start_xmit


Thursday, October 11, 2012, 10:02:26 AM, you wrote:

> On Wed, 2012-10-10 at 15:49 +0100, Sander Eikelenboom wrote:
>> Wednesday, October 10, 2012, 3:09:58 PM, you wrote:
>> 
>> > On Wed, 2012-10-10 at 11:13 +0100, Ian Campbell wrote:
>> >> I haven't tackled netfront yet. 
>> 
>> > I seem to be totally unable to reproduce the equivalent issue on the
>> > netfront xmit side, even though it seems like the loop in
>> > xennet_make_frags ought to be obviously susceptible to it.
>> 
>> > Konrad, Sander, are either of you able to repro, e.g. with:
>> 
>> 
>> Hmrrrmm i don't see any traces, only strange behaviour ..
>> 
>> - i can connect to guests by ssh, but it's sluggish, and sometimes stops working

> I saw something like this (ssh sluggish) even with dom0 itself. I'm
> trying to see if I can characterise it enough to reliably bisect it.

> I already switched out xen-unstable for 4.2-testing but that didn't make
> any difference.



>> - The guest seem to keep trying to connect to netback:
>> 
>> [  658.276719] xen_bridge: port 2(vif40.0) entered forwarding state
>> [  658.282258] xen_bridge: port 2(vif40.0) entered forwarding state
>> [  663.945964] xen_bridge: port 7(vif39.0) entered forwarding state
>> [  669.674277] xen_bridge: port 2(vif40.0) entered disabled state
>> [  669.680290] device vif40.0 left promiscuous mode
>> [  669.685464] xen_bridge: port 2(vif40.0) entered disabled state
>> [  672.857222] device vif41.0 entered promiscuous mode
>> [  673.166254] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
>> [  673.176368] xen_bridge: port 2(vif41.0) entered forwarding state
>> [  673.182042] xen_bridge: port 2(vif41.0) entered forwarding state
>> [  674.439725] xen_bridge: port 7(vif39.0) entered disabled state
>> [  674.445708] device vif39.0 left promiscuous mode
>> [  674.450955] xen_bridge: port 7(vif39.0) entered disabled state
>> [  677.726040] device vif42.0 entered promiscuous mode
>> [  678.053381] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
>> [  678.062804] xen_bridge: port 7(vif42.0) entered forwarding state
>> [  678.068433] xen_bridge: port 7(vif42.0) entered forwarding state
>> [  688.224736] xen_bridge: port 2(vif41.0) entered forwarding state
>> [  693.080557] xen_bridge: port 7(vif42.0) entered forwarding state
>> [  700.786276] xen_bridge: port 7(vif42.0) entered disabled state
>> [  700.792484] device vif42.0 left promiscuous mode
>> [  700.802409] xen_bridge: port 7(vif42.0) entered disabled state
>> [  704.133606] device vif43.0 entered promiscuous mode
>> [  704.460160] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
>> [  704.469800] xen_bridge: port 7(vif43.0) entered forwarding state
>> [  704.475303] xen_bridge: port 7(vif43.0) entered forwarding state
>> [  719.493788] xen_bridge: port 7(vif43.0) entered forwarding state
>> [  726.302456] xen_bridge: port 7(vif43.0) entered disabled state
>> [  726.308898] device vif43.0 left promiscuous mode
>> [  726.314029] xen_bridge: port 7(vif43.0) entered disabled state
>> 
>> All the guests are already up, but this keeps on going and going and going ....

> The domain number seems to be climbing, are you sure something isn't
> (crashing and) restarting?

Probably due to the BUG_ON from the patch below, i changed it into a WARN_ON.
And i seem to hit it, but only in one of the guests at the moment and it triggers quite irregularly.

[   34.298549] ------------[ cut here ]------------
[   34.298567] WARNING: at drivers/net/xen-netfront.c:465 xennet_start_xmit+0x7fe/0x860()
[   34.298574] Modules linked in:
[   34.298597] Pid: 1580, comm: sshd Not tainted 3.6.0pre-rc1-20121011 #1
[   34.298603] Call Trace:
[   34.298611]  [<ffffffff810664ea>] warn_slowpath_common+0x7a/0xb0
[   34.298617]  [<ffffffff81066535>] warn_slowpath_null+0x15/0x20
[   34.298623]  [<ffffffff8146d89e>] xennet_start_xmit+0x7fe/0x860
[   34.298631]  [<ffffffff8161f349>] dev_hard_start_xmit+0x209/0x460
[   34.298637]  [<ffffffff8163b036>] sch_direct_xmit+0xf6/0x290
[   34.298643]  [<ffffffff8161f746>] dev_queue_xmit+0x1a6/0x5a0
[   34.298649]  [<ffffffff8161f5a0>] ? dev_hard_start_xmit+0x460/0x460
[   34.298656]  [<ffffffff810aa8e5>] ? trace_softirqs_off+0x85/0x1b0
[   34.298663]  [<ffffffff816b9536>] ip_finish_output+0x226/0x530
[   34.298668]  [<ffffffff816b93dd>] ? ip_finish_output+0xcd/0x530
[   34.298674]  [<ffffffff816b9899>] ip_output+0x59/0xe0
[   34.298680]  [<ffffffff816b83b8>] ip_local_out+0x28/0x90
[   34.298687]  [<ffffffff816b896f>] ip_queue_xmit+0x17f/0x4a0
[   34.298692]  [<ffffffff816b87f0>] ? ip_send_unicast_reply+0x340/0x340
[   34.298699]  [<ffffffff810a0ba7>] ? getnstimeofday+0x47/0xe0
[   34.298705]  [<ffffffff8160f4c9>] ? __skb_clone+0x29/0x120
[   34.298711]  [<ffffffff816cea20>] tcp_transmit_skb+0x400/0x8d0
[   34.298717]  [<ffffffff816d19fa>] tcp_write_xmit+0x21a/0xa50
[   34.298723]  [<ffffffff816d225b>] tcp_push_one+0x2b/0x40
[   34.298728]  [<ffffffff816c2dec>] tcp_sendmsg+0x8dc/0xe20
[   34.298735]  [<ffffffff816e8f19>] inet_sendmsg+0xa9/0x100
[   34.298740]  [<ffffffff816e8e70>] ? inet_autobind+0x70/0x70
[   34.298746]  [<ffffffff810b0f88>] ? lock_acquire+0xd8/0x100
[   34.298753]  [<ffffffff8160630d>] sock_aio_write+0x12d/0x140
[   34.298762]  [<ffffffff811435b2>] do_sync_write+0xa2/0xe0
[   34.298768]  [<ffffffff810ad22d>] ? trace_hardirqs_on+0xd/0x10
[   34.298774]  [<ffffffff811441d4>] vfs_write+0x174/0x190
[   34.298779]  [<ffffffff811442fa>] sys_write+0x5a/0xa0
[   34.298786]  [<ffffffff812b33de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   34.298792]  [<ffffffff817491cc>] cstar_dispatch+0x7/0x26
[   34.298797] ---[ end trace 2e28eec93b7a8b74 ]---


Complete dmesg from guest attached.



>> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> > index b06ef81..8a3f770 100644
>> > --- a/drivers/net/xen-netfront.c
>> > +++ b/drivers/net/xen-netfront.c
>> > @@ -462,6 +462,8 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
>> >                 ref = gnttab_claim_grant_reference(&np->gref_tx_head);
>> >                 BUG_ON((signed short)ref < 0);
>> >  
>> > +               BUG_ON(PageCompound(skb_frag_page(frag)));
>> > +
>> >                 mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
>> >                 gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
>> >                                                 mfn, GNTMAP_readonly);
>> 
>> > My repro for netback was just to netcat a wodge of data from dom0->domU
>> > but going the other way doesn't seem to trigger.
>> 
>> 
>> 



View attachment "dmesg-netfront.txt" of type "text/plain" (177798 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ