[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <0hivdsb.f18682ec9367f08c76301d993553f1b8@obelix.schillstrom.com>
Date: Tue, 8 Nov 2011 00:29:53 +0100 (CET)
From: "Hans Schillstrom" <hans@...illstrom.com>
To: "Pablo Neira Ayuso" <pablo@...filter.org>
Cc: "Hans Schillstrom" <hans.schillstrom@...csson.com>,
kaber@...sh.net, jengelh@...ozas.de,
netfilter-devel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re[2]: [v2 PATCH 1/2] NETFILTER module xt_hmark new target
for HASH based fw
Hello Pablo
>
>Hi Hans,
>
>On Mon, Oct 03, 2011 at 07:46:42PM +0200, Hans Schillstrom wrote:
>> diff --git a/include/linux/netfilter/xt_hmark.h b/include/linux/netfilter/xt_hmark.h
>> new file mode 100644
>> index 0000000..6c1436a
>> --- /dev/null
>> +++ b/include/linux/netfilter/xt_hmark.h
[snip]
>>
>> +config NETFILTER_XT_TARGET_HMARK
>
>New config option has to go in alphabetic order (this one should go
>after NETFILTER_XT_TARGET_HL).
Oops, I'll fix that
[snip]
>> +/*
>> + * ICMP, get inner header so calc can be made on the source message
>> + * not the icmp header, i.e. same hash mark must be produced
>> + * on an icmp error message.
>> + */
>> +static int get_inner_hdr(struct sk_buff *skb, int iphsz, int nhoff)
>
>This looks very similar to icmp_error in nf_conntrack_proto_icmp.c.
>Yours lacks of checksumming validation btw.
Thanks I'll add that.
>
>I'm trying to find some place where we can put this function to make
>it available for both nf_conntrack_ipv4 and your module (to avoid code
>redundancy), but I didn't find any so far.
>
>It would be nice to find some way to avoid duplicating code with
>similar functionality.
>
I do agree, I've also been searching for some icmp "lib" to use...
[snip]
>
>extra space not required.
OK
>
>> + /* Error message? */
>> + if (icmph->type != ICMP_DEST_UNREACH &&
>> + icmph->type != ICMP_SOURCE_QUENCH &&
>> + icmph->type != ICMP_TIME_EXCEEDED &&
>> + icmph->type != ICMP_PARAMETERPROB &&
>> + icmph->type != ICMP_REDIRECT)
>> + goto out;
>> + /* Checkin full IP header plus 8 bytes of protocol to
>> + * avoid additional coding at protocol handlers.
>> + */
>> + if (!pskb_may_pull(skb, nhoff + iphsz + sizeof(_ih) + 8))
>> + goto out;
>
>We prefer skb_header_pointer instead. If conntrack is enabled, we can
>benefit from defragmention.
In our case conntrack will not be there
>Please, replace all pskb_may_pull by skb_header_pointer in this code.
>
>We can assume that the IP header is linear (not fragmented).
I ran in to this issue in IPv6 testing so I got a little bit "paranoid".
Are you sure that the embedded IP and L4 header in the ICMP msg also is unfragmented.
Is this true for both IPv6 & IPv4 ?
>From what I remember when I was testing IPv6 icmp and digged into the original header (on a 2.6.32 kernel)
pskb_may_pull was needed.
[snip]
>> +/*
>> + * Calc hash value, special casre is taken on icmp and fragmented messages
>> + * i.e. fragmented messages don't use ports.
>> + */
>> +static __u32 get_hash(struct sk_buff *skb, struct xt_hmark_info *info)
>
>This function seems to big to me, please, split it into smaller
>chunks, like get_hash_ipv4, get_hash_ipv6 and get_hash_ports.
>
Good catch I'll do that,
>> +{
>> + int nhoff, hash = 0, poff, proto, frag = 0;
>> + struct iphdr *ip;
>> + u8 ip_proto;
>> + u32 addr1, addr2, ihl;
>> + u16 snatport = 0, dnatport = 0;
>> + union {
>> + u32 v32;
>> + u16 v16[2];
>> + } ports;
>> +
>> + nhoff = skb_network_offset(skb);
>> + proto = skb->protocol;
>> +
>> + if (!proto && skb->sk) {
>> + if (skb->sk->sk_family == AF_INET)
>> + proto = __constant_htons(ETH_P_IP);
>> + else if (skb->sk->sk_family == AF_INET6)
>> + proto = __constant_htons(ETH_P_IPV6);
>
>You already have the layer3 protocol number in xt_action_param. No
>need to use the socket information then.
When splitting get_hash() above will be removed, xt_action_param ->family will be used for selection.
[snip]
>> +
>> + if (!ct || !nf_ct_is_confirmed(ct))
>
>You seem to (ab)use nf_ct_is_confirmed to make sure you're not in the
>original direction. Better use the direction that you get by means of
>nf_ct_get.
>
I'm not sure I follow you here ?
>> + break;
>> + otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
>> + /* On the "return flow", to get the original address
>> + * i,e, replace the source address.
>> + */
>> + if (ct->status & IPS_DST_NAT &&
>> + info->flags & XT_HMARK_USE_DNAT) {
>> + rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
>> + addr1 = (__force u32) otuple->dst.u3.in.s_addr;
>> + dnatport = otuple->dst.u.udp.port;
>> + }
>> + /* On the "return flow", to get the original address
>> + * i,e, replace the destination address.
>> + */
>> + if (ct->status & IPS_SRC_NAT &&
>> + info->flags & XT_HMARK_USE_SNAT) {
>> + rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
>> + addr2 = (__force u32) otuple->src.u3.in.s_addr;
>> + snatport = otuple->src.u.udp.port;
>> + }
>> + break;
>> + }
[snip]
>> + case NEXTHDR_NONE:
>> + nhoff += hdrlen;
>> + goto hdr_rdy;
>> + default:
>> + goto done;
>
>This goto doesn't make too much sense to me, better return 0.
hmmm
kind of left overs, Actually all "goto done" can be replaced by return 0
[snip]
>> +done:
>> + return 0;
>> +}
>
>I'll try to find more time to look into this. Specifically, I want to
>review the IPv6 bits more carefully.
The IPv6 header recursion is not obvious, and it's hard to test all cases :-)
I really appreciate you review
Thanks
Hans
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists