netdev - Re: [PATCH net-next 1/2] sunvnet: Process Rx data packets in a BH handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20141001193914.GL17706@oracle.com>
Date:	Wed, 1 Oct 2014 15:39:14 -0400
From:	Sowmini Varadhan <sowmini.varadhan@...cle.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	davem@...emloft.net, raghuram.kothakota@...cle.com,
	netdev@...r.kernel.org
Subject: Re: [PATCH net-next 1/2] sunvnet: Process Rx data packets in a BH
 handler

On (10/01/14 12:09), Eric Dumazet wrote:
> > -
> > +	/* BH context cannot call netif_receive_skb */
> > +	netif_rx_ni(skb);
> 
> Really ? What about the standard and less expensive netif_receive_skb ?

I can't use netif_receive_skb in this case:
the TCP retransmit timers are softirq context. They can pre-empt here, 
and result in a deadlock on socket locks. E.g.,

tcp_write_timer+0xc/0xa0 <-- wants sk_lock
call_timer_fn+0x24/0x120
run_timer_softirq+0x214/0x2a0
__do_softirq+0xb8/0x200
do_softirq+0x8c/0xc0
local_bh_enable+0xac/0xc0
ip_finish_output+0x254/0x4a0
ip_output+0xc4/0xe0
ip_local_out+0x2c/0x40
ip_queue_xmit+0x140/0x3c0
tcp_transmit_skb+0x448/0x740
tcp_write_xmit+0x220/0x480
__tcp_push_pending_frames+0x38/0x100
tcp_rcv_established+0x214/0x780
tcp_v4_do_rcv+0x154/0x300
tcp_v4_rcv+0x6cc/0xa60   <-- takes sk_lock
  :
netif_receive_skb
 

Ideally I would have liked to  use netif_receive_skb (it boosts perf)
but I had to back off for this reason.

> > +
> > +	struct mutex            vnet_rx_mutex; /* serializes rx_workq */
> > +	struct work_struct      rx_work;
> > +	struct workqueue_struct *rx_workq;
> > +
> >  };
> 
> Could you describe in the changelog why all this is needed ?

So I gave a short summary in the cover letter, but more details

- processing packets in ldc_rx context risks live-lock
- I experimented with a few things, including NAPI, and just using a simple tasklet
  to take care of the data packet handling. With Both NAPI and tasklet, I'm able
  to use netif_receive_skb safely, however, mpstat shows that one CPU ends up
  doing all the processing, and scaling was inhibited.
- further, with  NAPI the budget gets in the way. 

Regarding your other comments"
  "You basically found a way to overcome NAPI standard limits (budget of 64)"
As I said in the cover letter, coercing a budget on sunvnet ends up actually
hurting perf significantly, as we end up sending additional stop/start messages.
To achieve that budget, we'd have to keep a lot more state in vnet to remember
the position in the stream but *not* send a STOP/START, and instead resume
at the next napi_schedule from where we left off.

Doing all this would end up just re-inventing much of the code in process_backlog
anyway.

--Sowmini

 



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html