lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <D3F292ADF945FB49B35E96C94C2061B91257DC5F@nsmail.netscout.com>
Date:	Tue, 5 Jul 2011 17:10:41 -0400
From:	"Loke, Chetan" <Chetan.Loke@...scout.com>
To:	"Eric Dumazet" <eric.dumazet@...il.com>
Cc:	"Victor Julien" <victor@...iniac.net>,
	"David Miller" <davem@...emloft.net>, <netdev@...r.kernel.org>
Subject: RE: [PATCH 2/2] packet: Add fanout support.

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@...il.com]
> Sent: July 05, 2011 2:21 PM
> To: Loke, Chetan
> Cc: Victor Julien; David Miller; netdev@...r.kernel.org
> Subject: RE: [PATCH 2/2] packet: Add fanout support.
> 
> Le mardi 05 juillet 2011 à 13:35 -0400, Loke, Chetan a écrit :
> 
> > Sure, a lookup is needed(to steer what I call - Hot/Cold flows) and
> > was proposed by me on the oisf mailing list. Always, use the ip_id
> bit
> > then? Another problem that needs to be solved is, what if some
> > decoders are overloaded, then what? How will this scheme work? How
> > will we utilize other CPUs? RPS is needed for sure.
> >
> > If we maintain a i) per port lookup-table ii) 2^20 flows/table and
> > iii) 16 bytes/flow(one can also squeeze it down to 8 bytes) then we
> > will need around 32MB worth memory/port. It's not a huge memory
> > pressure for folks who want to use linux for doing IPS/IDS sort of
> > stuff.
> >
> > User-space decoders end up copying the packet anyways. So fanout can
> > be implemented in user-space to achieve effective CPU utilization.
> > As long as we don't bounce on different CPU-socket we could be ok.
> 
> This is the problem we want to address.
> 
> Going into user-space to perform the fanout is what you already have
> today, with one socket, one thread doing the fanout to worker threads.
> 
> David patch is non adaptative : its a hash on N queue, with a fixed
> hash
> function.
> 


> What you want is to add another 'control queue' where new flows are
> directed. Then user application is able to reinject into kernel flow
> director the "This flow should go to queue X" information.
> 

I like the term - 'kernel-flow-director'. The problem with rebalancing from user-space is that we will need to have some 'idle' period before we inject flow-redirection event into the kernel. And for bursty workloads this may be problematic. We may have to rebalance often. And then we will have to export the 'rebalance-idle-interval' knob.
And users may have to tune it for their workloads etc.

> Or, let the kernel do a mix of rxhash and loadbalance : Be able to
> select a queue for a new flow without user land control, using a Flow
> hash table.
> 

This is exactly what I had proposed. Hash{== lookup of hot/cold flows}+LB{== kernel-flow-director} is what we need.

So something like:

hot_fanout_id = is_flow_active(rx_hash,lookup_table);

if (hot_fanout_id)
   /* This flow is Hot */
   steer_to(hot_fanout_id);
else {
  /* This flow is cold - Get next_rr fanout_id */
  
  fanout_rr_next(...);
  ...
  steer_to(cold_fanout_id);
}

And,

1)hash on <src_ip_addr,dst_ip_addr,src_port,dst_port>
2)store the ip_id from the first fragment in the flow_hash_table for matching subsequent ip_fragments.
  One corner case - If the first fragment arrives out-of-order(OOO).
  2.1) user has configured - assemble-fragments: In this case it doesn’t matter.
  2.2) user has configured - fwd-as-is: Redirect OOO-fragments to the next_rr_fanout_id.
	 2.2.1) 
       And so we may have to set a bit in hash_lookup_table for a flow indicating it arrived 'OOO'.
	 So, is_flow_active will have to lookup twice. First using ip_id as 3rd var in jhash. And second lookup using ports as 3rd var. Worst case there will be two lookups on 	 OOO.
	 OR
	 2.2.2) Effectively treat this fragment as 'assemble-fragments' case(?) as in 2.1).


So {hash+LB} together, should take care of the fragmented/non-fragmented flow.

We will have to purge the flow-entry at some point to avoid false routing. Don't know if the control-queue that you mentioned above can be used for purging flows?
Or the control-queue itself can be mmap'd so that user-space can clear 'a' flow-entry?


Chetan Loke

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ