lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190319194929.10798-1-ldir@darbyshire-bryant.me.uk>
Date:   Tue, 19 Mar 2019 19:49:55 +0000
From:   Kevin 'ldir' Darbyshire-Bryant <ldir@...byshire-bryant.me.uk>
To:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC:     "jiri@...nulli.us" <jiri@...nulli.us>,
        "xiyou.wangcong@...il.com" <xiyou.wangcong@...il.com>,
        "jhs@...atatu.com" <jhs@...atatu.com>,
        Kevin 'ldir' Darbyshire-Bryant <ldir@...byshire-bryant.me.uk>
Subject: [RFC PATCH 0/1 net-next] net: sched: Introduce conndscp action

With nervousness and trepidation I'm submitting the attached RFC patch
for 'conndscp'.

Conndscp is a new tc filter action module.  It is designed to copy DSCPs
to conntrack marks and the reverse operation of conntrack mark contained
DSCPs to the diffserv field of suitable skbs.

The feature is intended for use and has been found useful for restoring
ingress classifications based on egress classifications across links
that bleach or otherwise change DSCP, typically home ISP Internet links.
Restoring DSCP on ingress on the WAN link allows qdiscs such as CAKE to
shape inbound packets according to policies that are easier to implement
on egress.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

conndscp understands the following parameters:

mask - a 32 bit mask of at least 6 contiguous bits where conndscp will
place the DSCP in conntrack mark.  The DSCP is left-shifted by the
number of unset lower bits of the mask before storing into the mark
field.

statemask - a 32 bit mask of (usually) 1 bit length, outside the area
specified by mask.  This represents a conditional operation flag - get
will only store the DSCP if the flag is unset.  set will only restore
the DSCP if the flag is set.  This is useful to implement a 'one shot'
iptables based classification where the 'complicated' iptables rules are
only run once to classify the connection on initial (egress) packet and
subsequent packets are all marked/restored with the same DSCP.  A mask
of zero disables the conditional behaviour.

mode - get/set/both - get stores the DSCP into the mark, set restores
the DSCP into the diffserv field from the mark, both 'gets' the mark and
then 'sets' it in that order.

optional parameters:

zone - conntrack zone

control - action related control (reclassify | pipe | drop | continue |
ok | goto chain <CHAIN_INDEX>


A typical example of using conndscp to restore DSCP values for use with
a qdisc (e.g. CAKE) is shown below, using top 6 bits to store the DSCP
and the bottom bit of top byte as the state flag.

# egress qdisc
tc qdisc add dev eth0 cake bandwidth 20000kbit
# put an action on the egress interface to get DSCP to connmark->mark
# and to set DSCP from the stored connmark.
# this seems counter intuitive but it ensures once the mark is set that all
# subsequent egress packets have the same stored DSCP avoiding iptables rules
# to mark every packet, conndscp does it for us and then CAKE is happy using the
# DSCP
tc filter add dev eth0 protocol all prio 10 u32 match u32 0 0 flowid 1:1 action \
	conndscp mask 0xfc000000 statemask 0x01000000 mode both


#ingress qdisc via an ifb

tc qdisc add dev eth0 handle ffff: ingress
tc qdisc add dev ifb4eth0 cake badnwidth 80000kbit
ip link set ifb4eth0 up
# redirect all packets arriving on eth0 to ifb4eth0 and restore the DSCP from connmark
tc filter add dev eth0 parent ffff: protocol all prio 10 u32 \
	match u32 0 0 flowid 1:1 action \
	conndscp mask 0xfc000000 statemask 0x01000000 mode set \
	mirred egress redirect dev ifb4eth0

#iptables rules using the statemask flag to only do it once

iptables -t mangle -N QOS_MARK_eth0

iptables -t mangle -A QOS_MARK_eth0 -m set --match-set Bulk4  dst -j DSCP --set-dscp-class CS1 -m comment --comment "Bulk CS1 ipset"
#add more rules similar to above as required


# send unmarked packets to the marking chain - conndscp will set the statemask bit
# if not already set.
iptables -t mangle -A POSTROUTING -o eth0 -m connmark --mark 0x00000000/0x01000000 -g QOS_MARK_eth0

conndscp (almost) shamelessly copies code from connmark and therefore
contains the same limitations.

I am not a full time programmer, conndscp represents something of the
order of a 2 week struggle, my C is awful, kernel & network knowledge
worse, though I like to think improving.  There are no doubt issues with
this patch/feature but I hope constructive feedback, quite possibly in
very short words for my simple brain, will knock it into shape.

Thanks for your time.

Kevin Darbyshire-Bryant (1):
  net: sched: Introduce conndscp action

 include/net/tc_act/tc_conndscp.h          |  19 ++
 include/uapi/linux/tc_act/tc_conndscp.h   |  33 +++
 net/sched/Kconfig                         |  13 +
 net/sched/Makefile                        |   1 +
 net/sched/act_conndscp.c                  | 333 ++++++++++++++++++++++
 tools/testing/selftests/tc-testing/config |   1 +
 6 files changed, 400 insertions(+)
 create mode 100644 include/net/tc_act/tc_conndscp.h
 create mode 100644 include/uapi/linux/tc_act/tc_conndscp.h
 create mode 100644 net/sched/act_conndscp.c

-- 
2.17.2 (Apple Git-113)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ