[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120305182248.GA29022@1984>
Date: Mon, 5 Mar 2012 19:22:48 +0100
From: Pablo Neira Ayuso <pablo@...filter.org>
To: Hans Schillstrom <hans.schillstrom@...csson.com>
Cc: "kaber@...sh.net" <kaber@...sh.net>,
"jengelh@...ozas.de" <jengelh@...ozas.de>,
"netfilter-devel@...r.kernel.org" <netfilter-devel@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"hans@...illstrom.com" <hans@...illstrom.com>
Subject: Re: [v9 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH
based fwmark
Let me trim off parts that have been already discussed.
On Mon, Mar 05, 2012 at 11:09:46AM +0100, Hans Schillstrom wrote:
[...]
> ...
> > > +
> > > +/*
> > > + * ICMP, get header offset if icmp error
> > > + */
> > > +static int get_inner_hdr(struct sk_buff *skb, int iphsz, int nhoff)
> > > +{
> > > + const struct icmphdr *icmph;
> > > + struct icmphdr _ih;
> > > +
> > > + /* Not enough header? */
> > > + icmph = skb_header_pointer(skb, nhoff + iphsz, sizeof(_ih), &_ih);
> > > + if (icmph == NULL)
> > > + return nhoff;
> >
> > I think this has to return -1 in this case.
>
> No, it must return the unchanged value of nhoffs.
> Unless I change the usge of it as described later.
I see, you're defaulting on the original header if you cannot get the
ICMP header (eg. fragment case).
> > > +
> > > + if (icmph->type > NR_ICMP_TYPES)
> > > + return nhoff;
> >
> > Same thing.
>
> Same comment.
So if you get an unsupportted ICMP message, you rely on the original
IP header.
> > > +
> > > + /* Error message? */
> > > + if (icmph->type != ICMP_DEST_UNREACH &&
> > > + icmph->type != ICMP_SOURCE_QUENCH &&
> > > + icmph->type != ICMP_TIME_EXCEEDED &&
> > > + icmph->type != ICMP_PARAMETERPROB &&
> > > + icmph->type != ICMP_REDIRECT)
> > > + return nhoff;
> > > +
> > > + return nhoff + iphsz + sizeof(_ih);
> > > +}
> > > +
> > > +#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
> > > +/*
> > > + * Get ipv6 header offset if icmp error
> > > + */
> > > +static int get_inner6_hdr(struct sk_buff *skb, int *offset)
> >
> > I'd suggest to return the offset like in get_inner_hdr, not need for
> > one extra parameter then.
>
> get_inner6_hdr() must return true/false not an offset
>
> It's hard to make IPv4 & IPv6 the same due to the more complex header finding in IPv6,
> Maybe IPv4 can get more like IPv6,
>
> static int get_inner_hdr(struct sk_buff *skb, int *nhoff)
> {
> const struct icmphdr *icmph;
> struct icmphdr _ih;
> int iphz = ((struct iphdr *)skb_network_header(skb))->ihl * 4;
>
> /* Not enough header? */
> icmph = skb_header_pointer(skb, *nhoff + iphsz, sizeof(_ih), &_ih);
> if (icmph == NULL || icmph->type > NR_ICMP_TYPES)
> return 0;
>
> /* Error message? */
> if (icmph->type != ICMP_DEST_UNREACH &&
> icmph->type != ICMP_SOURCE_QUENCH &&
> icmph->type != ICMP_TIME_EXCEEDED &&
> icmph->type != ICMP_PARAMETERPROB &&
> icmph->type != ICMP_REDIRECT)
> return 0
>
> *nhoff += iphsz + sizeof(_ih);
> return 1;
I only suggested to change the return value, no need to modify the
function, I liked how it was.
> }
>
>
> ...
>
> if (ip->protocol == IPPROTO_ICMP) {
> /* calc hash on inner header if an icmp error */
> if (get_inner_hdr(skb, &nhoff)) {
> ip = skb_header_pointer(skb, nhoff, sizeof(_ip), &_ip);
I prefer this chunk.
> if (!ip)
> return XT_CONTINUE;
> }
> }
>
>
> instead of :
>
> if (ip->protocol == IPPROTO_ICMP) {
> /* calc hash on inner header if an icmp error */
> nhoff = get_inner_hdr(skb, ip->ihl * 4, nhoff);
> ip = skb_header_pointer(skb, nhoff, sizeof(_ip), &_ip);
> if (!ip)
> return XT_CONTINUE;
> }
>
> I don't see why this should be done, maintenance ?
It seems more readable to me.
> >
> > > +{
> > > + struct icmp6hdr *icmp6h, _ih6;
> > > +
> > > + icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
> > > + if (icmp6h == NULL)
> > > + return 0;
> > > +
> > > + if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
> > > + *offset += sizeof(struct icmp6hdr);
> > ^^
> > extra space there.
> >
> > > + return 1;
> > > + }
> > > + return 0;
> > > +}
> > > +/*
> > > + * Calculate hash based fw-mark, on the five tuple if possible.
> > > + * special cases :
> > > + * - Fragments do not use ports not even on the first fragment,
> > > + * nf_defrag_ipv6.ko don't defrag for us like it do in ipv4.
> > > + * This might be changed in the future.
> > > + * - On ICMP errors the inner header will be used.
> > > + * - Tunnels no ports
> > > + * - ESP & AH uses SPI
> > > + * @returns XT_CONTINUE
> > > + */
> > > +static unsigned int
> > > +hmark_v6(struct sk_buff *skb, const struct xt_action_param *par)
> > > +{
> > > + struct xt_hmark_info *info = (struct xt_hmark_info *)par->targinfo;
> > > + struct ipv6hdr *ip6, _ip6;
> > > + int poff, flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
> > > + u32 addr1, addr2, hash, nhoffs = 0;
> > > + u8 nexthdr;
> > > + union hmark_ports uports = { .v32 = 0 };
> >
> > I think that this can be: union hmark_ports uports = {};
>
> I think moving the init further down is better, since there is a couple of returns before.
I don't mind. It was just one suggestion.
>
> + uport.v32 = 0;
> + if ((info->flags & XT_F_HMARK_METHOD_L3) ||
> + (nexthdr == IPPROTO_ICMPV6))
> + goto no6ports;
>
> >
> > > + unsigned short fragoff = 0;
> > > +
> > > + ip6 = (struct ipv6hdr *) (skb->data + skb_network_offset(skb));
> > > +
> > > + nexthdr = ipv6_find_hdr(skb, &nhoffs, -1, &fragoff, &flag);
> > > + if (nexthdr < 0)
> > > + return XT_CONTINUE;
> > > + /* No need to check for icmp errors on fragments */
> > > + if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
> > > + goto noicmp;
> > > + /* if an icmp error, use the inner header */
> > > + if (get_inner6_hdr(skb, &nhoffs)) {
> >
> > I think you have to check that nexthdr == IPPROTO_ICMPV6 here,
> > otherwise non-icmp packets will be passed to get_inner6_hdr.
> >
>
> I do, (4 lines up)
> + if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
> + goto noicmp;
Right.
>
> > > + ip6 = skb_header_pointer(skb, nhoffs, sizeof(_ip6), &_ip6);
> > > + if (!ip6)
> > > + return XT_CONTINUE;
> > > + /* Treat AH as ESP */
> >
> > This comments means: try to find auth headers if possible, right?
> No, It might be a litle bit unclean but earlier Patric, you and I brought it up,
> - the descending into AH and port searching.
>
> I think the comment should be some thing like,
> /* Treat AH as ESP, use SPI nothing else. */
It's fine.
> >
> > > + flag = IP6T_FH_F_AUTH;
> > > + nexthdr = ipv6_find_hdr(skb, &nhoffs, -1, &fragoff, &flag);
> > > + if (nexthdr < 0)
> > > + return XT_CONTINUE;
> > > + }
> > > +noicmp:
> > > + addr1 = (__force u32)
> > > + (ip6->saddr.s6_addr32[0] & info->smask.in6.s6_addr32[0]) ^
> > > + (ip6->saddr.s6_addr32[1] & info->smask.in6.s6_addr32[1]) ^
> > > + (ip6->saddr.s6_addr32[2] & info->smask.in6.s6_addr32[2]) ^
> > > + (ip6->saddr.s6_addr32[3] & info->smask.in6.s6_addr32[3]);
> > > + addr2 = (__force u32)
> > > + (ip6->daddr.s6_addr32[0] & info->dmask.in6.s6_addr32[0]) ^
> > > + (ip6->daddr.s6_addr32[1] & info->dmask.in6.s6_addr32[1]) ^
> > > + (ip6->daddr.s6_addr32[2] & info->dmask.in6.s6_addr32[2]) ^
> > > + (ip6->daddr.s6_addr32[3] & info->dmask.in6.s6_addr32[3]);
> > > +
> > > + if ((info->flags & XT_F_HMARK_METHOD_L3) ||
> > > + (nexthdr == IPPROTO_ICMPV6))
> > > + goto no6ports;
> > > + /* Is next header valid for port or SPI calculation ? */
> > > + poff = proto_ports_offset(nexthdr);
> > > + if ((flag & IP6T_FH_F_FRAG) || poff < 0)
> > > + return XT_CONTINUE;
> > > +
> > > + nhoffs += poff;
> > > + /* Since uports is modified, skb_header_pointer() can't be used */
> >
> > Why not? You can pass &uports to skb_header_pointer, it takes a
> > generic pointer.
>
> uports is a local var that will written into later (swap and mask),
> The comment is there becuse of that.
>
> >
> > > + if (!pskb_may_pull(skb, nhoffs + 4))
> > > + return XT_CONTINUE;
> > > + uports.v32 = * (__force u32 *) (skb->data + nhoffs);
I think you can do like in other parts of netfilter:
union hmark_ports _uports = { ... };
union hmark_ports *uports;
uports = skb_header_pointer(skb, nhoffs + poff, sizeof(_uports), &_uports);
if (uports == NULL)
return XT_CONTINUE;
Then use uports->v32 in the rest of the code.
> > > +
> > > + if ((nexthdr == IPPROTO_ESP) || (nexthdr == IPPROTO_AH))
> > > + uports.v32 = (uports.v32 & info->spimask) | info->spiset;
> > > + else {
> > > + uports.v32 = (uports.v32 & info->pmask.v32) | info->pset.v32;
> > > + /* get a consistent hash (same value on both flow directions) */
> > > + if (uports.p16.dst < uports.p16.src)
> > > + swap(uports.p16.dst, uports.p16.src);
> > > + }
> > > +
> > > +no6ports:
> > > + nexthdr &= info->prmask;
> > > + /* get a consistent hash (same value on both flow directions) */
> > > + if (addr2 < addr1)
> > > + swap(addr1, addr2);
> > > +
> > > + hash = jhash_3words(addr1, addr2, uports.v32, info->hashrnd) ^ nexthdr;
> > > + skb->mark = (hash % info->hmod) + info->hoffs;
> > > + return XT_CONTINUE;
> > > +}
> > > +#endif
> > > +/*
> > > + * Calculate hash based fw-mark, on the five tuple if possible.
> > > + * special cases :
> > > + * - Fragments do not use ports not even on the first fragment,
> > > + * unless nf_defrag_xx.ko is used.
> > > + * - On ICMP errors the inner header will be used.
> > > + * - Tunnels no ports
> > > + * - ESP & AH uses SPI
> > > + * @returns XT_CONTINUE
> > > + */
> > > +static unsigned int
> > > +hmark_v4(struct sk_buff *skb, const struct xt_action_param *par)
> > > +{
> > > + struct xt_hmark_info *info = (struct xt_hmark_info *)par->targinfo;
> > > + int nhoff, poff, frag = 0;
> > > + struct iphdr *ip, _ip;
> > > + u8 ip_proto;
> > > + u32 addr1, addr2, hash;
> > > + u16 snatport = 0, dnatport = 0;
> > > + union hmark_ports uports;
> > > + enum ip_conntrack_info ctinfo;
> > > + struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
> >
> > You spend cycles initializing this variable, but you may not use it.
>
> Yes, it can be improved ...
>
> > For the conntrack case, you can make in the very beginning:
> >
> > if (info->flags & XT_HMARK_CT) {
> > struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
> >
> > if (ct && !nf_ct_is_untracked(ct)) {
> > struct nf_conntrack_tuple *otuple =
> > &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
> > struct nf_conntrack_tuple *rtuple =
> > &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
> >
> > addr_src = (__force u32) otuple->src.u3.in.s_addr;
> > port_src = otuple->src.u.all;
> > addr_dst = (__force u32) rtuple->src.u3.in.s_addr;
> > port_dst = rtuple->src.u.all;
> > } else
> > return XT_CONTINUE;
> > }
> >
> > With SNAT/masquerade:
> > original: A -> B
> > reply: B -> FW
> >
> > With DNAT:
> > original: A -> FW
> > reply: B -> A
> >
> > So real addresses are always source for the original tuple and source
> > for the reply tuple.
> >
> > I think it's better just to tell HMARK to use CT, no need to specify
> > what part of it, it's simple and the user will not get confused
> > selecting wrong configurations.
> >
> We discussed that and you wrote:
>
> "My opinion is that the user must have total control on the target
> behaviour through the configuration options."
> ...
> "I'm fine if you allow to select what tuple you want to use to hash."
>
> Have you changed you opinion ?
> From my point of view it doesn't matter.
To add what I've already said, I think it's also good if we can avoid
that users make wrong decisions.
> > Note that if you didn't not select to use conntrack, we default on the
> > packet-based version for HMARK which makes all the fragmentation
> > handling and so on.
> >
> > Again, I'm proposing you to use addr_src and addr_dst instead of addr1
> > and addr2 for readability.
> >
> > And remove the extra white extra line here below.
> >
> > > +
> > > +
> > > + nhoff = skb_network_offset(skb);
> > > + uports.v32 = 0;
> >
> > Better initialize this in the variable definition.
>
> or beter, move it to just before:
>
> + uports.v32 = 0;
> + if ((info->flags & XT_F_HMARK_METHOD_L3) || (ip_proto == IPPROTO_ICMP))
> + goto noports;
OK.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists