[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YFpe4GGQmfav7qOR@p50.lan>
Date: Tue, 23 Mar 2021 18:34:24 -0300
From: Flavio Leitner <fbl@...close.org>
To: Aaron Conole <aconole@...hat.com>
Cc: netdev@...r.kernel.org, dev@...nvswitch.org,
Pravin B Shelar <pshelar@....org>,
Joe Stringer <joe@...ium.io>,
Eelco Chaudron <echaudro@...hat.com>,
Dan Williams <dcbw@...hat.com>,
Lorenzo Bianconi <lorenzo@...nel.org>,
Numan Siddique <nusiddiq@...hat.com>
Subject: Re: [PATCH] openvswitch: perform refragmentation for packets which
pass through conntrack
Hi,
On Fri, Mar 19, 2021 at 04:43:07PM -0400, Aaron Conole wrote:
> When a user instructs a flow pipeline to perform connection tracking,
> there is an implicit L3 operation that occurs - namely the IP fragments
> are reassembled and then processed as a single unit. After this, new
> fragments are generated and then transmitted, with the hint that they
> should be fragmented along the max rx unit boundary. In general, this
> behavior works well to forward packets along when the MTUs are congruent
> across the datapath.
>
> However, if using a protocol such as UDP on a network with mismatching
> MTUs, it is possible that the refragmentation will still produce an
> invalid fragment, and that fragmented packet will not be delivered.
> Such a case shouldn't happen because the user explicitly requested a
> layer 3+4 function (conntrack), and that function generates new fragments,
> so we should perform the needed actions in that case (namely, refragment
> IPv4 along a correct boundary, or send a packet too big in the IPv6 case).
>
> Additionally, introduce a test suite for openvswitch with a test case
> that ensures this MTU behavior, with the expectation that new tests are
> added when needed.
>
> Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
> Signed-off-by: Aaron Conole <aconole@...hat.com>
> ---
> NOTE: checkpatch reports a whitespace error with the openvswitch.sh
> script - this is due to using tab as the IFS value. This part
> of the script was copied from
> tools/testing/selftests/net/pmtu.sh so I think should be
> permissible.
>
> net/openvswitch/actions.c | 2 +-
> tools/testing/selftests/net/.gitignore | 1 +
> tools/testing/selftests/net/Makefile | 1 +
> tools/testing/selftests/net/openvswitch.sh | 394 +++++++++++++++++++++
> 4 files changed, 397 insertions(+), 1 deletion(-)
> create mode 100755 tools/testing/selftests/net/openvswitch.sh
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 92a0b67b2728..d858ea580e43 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -890,7 +890,7 @@ static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port,
> if (likely(!mru ||
> (skb->len <= mru + vport->dev->hard_header_len))) {
> ovs_vport_send(vport, skb, ovs_key_mac_proto(key));
> - } else if (mru <= vport->dev->mtu) {
> + } else if (mru) {
> struct net *net = read_pnet(&dp->net);
If the egress port has MTU less than MRU, then before the patch
the packet is dropped and after the patch ip_do_fragment() will
correctly take care of fragmenting according with egress MTU.
That seems a reasonable change to me.
This condition below makes little sense to me now that this patch
is changing the MTU assumption:
skb->len <= mru + vport->dev->hard_header_len
The MRU depends on the ingress port. Perhaps that should check if
the skb length fits into the egress port even with mru available:
if (!mru || (packet_length(skb, vport->dev) <= vport->dev->mtu)) {
ovs_vport_send(vport, skb, ovs_key_mac_proto(key));
else {
ovs_fragment(net, vport, skb, mru, key);
}
IIRC, a GSO packet will always fall into the first condition
otherwise we need an additional skb_is_gso(skb).
Also, that last 'else' branch is not needed anymore.
Then we have check_pkt_len which Aaron and I discussed a bit
offline. I think we should revert commit[1] at least the part
relying on mru because a packet with mru > mtu is not dropped
after the patch.
[1] 178436557 ("openvswitch: take into account de-fragmentation/gso_size
in execute_check_pkt_len")
Thanks,
fbl
>
> ovs_fragment(net, vport, skb, mru, key);
> diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
> index 61ae899cfc17..d4d7487833be 100644
> --- a/tools/testing/selftests/net/.gitignore
> +++ b/tools/testing/selftests/net/.gitignore
> @@ -30,3 +30,4 @@ hwtstamp_config
> rxtimestamp
> timestamping
> txtimestamp
> +test_mismatched_mtu_with_conntrack
> diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> index 25f198bec0b2..dc9b556f86fd 100644
> --- a/tools/testing/selftests/net/Makefile
> +++ b/tools/testing/selftests/net/Makefile
> @@ -23,6 +23,7 @@ TEST_PROGS += drop_monitor_tests.sh
> TEST_PROGS += vrf_route_leaking.sh
> TEST_PROGS += bareudp.sh
> TEST_PROGS += unicast_extensions.sh
> +TEST_PROGS += openvswitch.sh
> TEST_PROGS_EXTENDED := in_netns.sh
> TEST_GEN_FILES = socket nettest
> TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any
> diff --git a/tools/testing/selftests/net/openvswitch.sh b/tools/testing/selftests/net/openvswitch.sh
> new file mode 100755
> index 000000000000..7b6341688cc3
> --- /dev/null
> +++ b/tools/testing/selftests/net/openvswitch.sh
> @@ -0,0 +1,394 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# OVS kernel module self tests
> +#
> +# Tests currently implemented:
> +#
> +# - mismatched_mtu_with_conntrack
> +# Set up two namespaces (client and server) which each have devices specifying
> +# incongruent MTUs (1450 vs 1500 in the test). Transmit a UDP packet of 1901 bytes
> +# from client to server, and back. Ensure that the ct() action
> +# uses the mru as a hint, but not a forced check.
> +
> +
> +# Kselftest framework requirement - SKIP code is 4.
> +ksft_skip=4
> +
> +PAUSE_ON_FAIL=no
> +VERBOSE=0
> +TRACING=0
> +
> +tests="
> + mismatched_mtu_with_conntrack ipv4: IP Fragmentation with conntrack"
> +
> +
> +usage() {
> + echo
> + echo "$0 [OPTIONS] [TEST]..."
> + echo "If no TEST argument is given, all tests will be run."
> + echo
> + echo "Options"
> + echo " -t: capture traffic via tcpdump"
> + echo " -v: verbose"
> + echo " -p: pause on failure"
> + echo
> + echo "Available tests${tests}"
> + exit 1
> +}
> +
> +on_exit() {
> + echo "$1" > ${ovs_dir}/cleanup.tmp
> + cat ${ovs_dir}/cleanup >> ${ovs_dir}/cleanup.tmp
> + mv ${ovs_dir}/cleanup.tmp ${ovs_dir}/cleanup
> +}
> +
> +ovs_setenv() {
> + sandbox=$1
> +
> + ovs_dir=$ovs_base${1:+/$1}; export ovs_dir
> +
> + test -e ${ovs_dir}/cleanup || : > ${ovs_dir}/cleanup
> +
> + OVS_RUNDIR=$ovs_dir; export OVS_RUNDIR
> + OVS_LOGDIR=$ovs_dir; export OVS_LOGDIR
> + OVS_DBDIR=$ovs_dir; export OVS_DBDIR
> + OVS_SYSCONFDIR=$ovs_dir; export OVS_SYSCONFDIR
> + OVS_PKGDATADIR=$ovs_dir; export OVS_PKGDATADIR
> +}
> +
> +ovs_exit_sig() {
> + . "$ovs_dir/cleanup"
> + ovs-dpctl del-dp ovs-system
> +}
> +
> +ovs_cleanup() {
> + ovs_exit_sig
> + [ $VERBOSE = 0 ] || echo "Error detected. See $ovs_dir for more details."
> +}
> +
> +ovs_normal_exit() {
> + ovs_exit_sig
> + rm -rf ${ovs_dir}
> +}
> +
> +info() {
> + [ $VERBOSE = 0 ] || echo $*
> +}
> +
> +kill_ovs_vswitchd () {
> + # Use provided PID or save the current PID if available.
> + TMPPID=$1
> + if test -z "$TMPPID"; then
> + TMPPID=$(cat $OVS_RUNDIR/ovs-vswitchd.pid 2>/dev/null)
> + fi
> +
> + # Tell the daemon to terminate gracefully
> + ovs-appctl -t ovs-vswitchd exit --cleanup 2>/dev/null
> +
> + # Nothing else to be done if there is no PID
> + test -z "$TMPPID" && return
> +
> + for i in 1 2 3 4 5 6 7 8 9; do
> + # Check if the daemon is alive.
> + kill -0 $TMPPID 2>/dev/null || return
> +
> + # Fallback to whole number since POSIX doesn't require
> + # fractional times to work.
> + sleep 0.1 || sleep 1
> + done
> +
> + # Make sure it is terminated.
> + kill $TMPPID
> +}
> +
> +start_daemon () {
> + info "exec: $@ -vconsole:off --detach --no-chdir --pidfile --log-file"
> + "$@" -vconsole:off --detach --no-chdir --pidfile --log-file >/dev/null
> + pidfile="$OVS_RUNDIR"/$1.pid
> +
> + info "setting kill for $@..."
> + on_exit "test -e \"$pidfile\" && kill \`cat \"$pidfile\"\`"
> +}
> +
> +if test "X$vswitchd_schema" = "X"; then
> + vswitchd_schema="/usr/share/openvswitch"
> +fi
> +
> +ovs_sbx() {
> + if test "X$2" != X; then
> + (ovs_setenv $1; shift; "$@" >> ${ovs_dir}/debug.log)
> + else
> + ovs_setenv $1
> + fi
> +}
> +
> +seq () {
> + if test $# = 1; then
> + set 1 $1
> + fi
> + while test $1 -le $2; do
> + echo $1
> + set `expr $1 + ${3-1}` $2 $3
> + done
> +}
> +
> +ovs_wait () {
> + info "$1: waiting $2..."
> +
> + # First try the condition without waiting.
> + if ovs_wait_cond; then info "$1: wait succeeded immediately"; return 0; fi
> +
> + # Try a quick sleep, so that the test completes very quickly
> + # in the normal case. POSIX doesn't require fractional times to
> + # work, so this might not work.
> + sleep 0.1
> + if ovs_wait_cond; then info "$1: wait succeeded quickly"; return 0; fi
> +
> + # Then wait up to OVS_CTL_TIMEOUT seconds.
> + local d
> + for d in `seq 1 "$OVS_CTL_TIMEOUT"`; do
> + sleep 1
> + if ovs_wait_cond; then info "$1: wait succeeded after $d seconds"; return 0; fi
> + done
> +
> + info "$1: wait failed after $d seconds"
> + ovs_wait_failed
> +}
> +
> +sbxs=
> +sbx_add () {
> + info "adding sandbox '$1'"
> +
> + sbxs="$sbxs $1"
> +
> + NO_BIN=0
> + which ovsdb-tool >/dev/null 2>&1 || NO_BIN=1
> + which ovsdb-server >/dev/null 2>&1 || NO_BIN=1
> + which ovs-vsctl >/dev/null 2>&1 || NO_BIN=1
> + which ovs-vswitchd >/dev/null 2>&1 || NO_BIN=1
> +
> + if [ $NO_BIN = 1 ]; then
> + info "Missing required binaries..."
> + return 4
> + fi
> + # Create sandbox.
> + local d="$ovs_base"/$1
> + if [ -e $d ]; then
> + info "removing $d"
> + rm -rf "$d"
> + fi
> + mkdir "$d" || return 1
> + ovs_setenv $1
> +
> + # Create database and start ovsdb-server.
> + info "$1: create db and start db-server"
> + : > "$d"/.conf.db.~lock~
> + ovs_sbx $1 ovsdb-tool create "$d"/conf.db "$vswitchd_schema"/vswitch.ovsschema || return 1
> + ovs_sbx $1 start_daemon ovsdb-server --detach --remote=punix:"$d"/db.sock || return 1
> +
> + # Initialize database.
> + ovs_sbx $1 ovs-vsctl --no-wait -- init || return 1
> +
> + # Start ovs-vswitchd
> + info "$1: start vswitchd"
> + ovs_sbx $1 start_daemon ovs-vswitchd -vvconn -vofproto_dpif -vunixctl
> +
> + ovs_wait_cond () {
> + if ip link show ovs-netdev >/dev/null 2>&1; then return 1; else return 0; fi
> + }
> + ovs_wait_failed () {
> + :
> + }
> +
> + ovs_wait "sandbox_add" "while ip link show ovs-netdev" || return 1
> +}
> +
> +ovs_base=`pwd`
> +
> +# mismatched_mtu_with_conntrack test
> +# - client has 1450 byte MTU
> +# - server has 1500 byte MTU
> +# - use UDP to send 1901 bytes each direction for mismatched
> +# fragmentation.
> +test_mismatched_mtu_with_conntrack() {
> +
> + sbx_add "test_mismatched_mtu_with_conntrack" || return $?
> +
> + info "create namespaces"
> + for ns in client server; do
> + ip netns add $ns || return 1
> + on_exit "ip netns del $ns"
> + done
> +
> + # setup the base bridge
> + ovs_sbx "test_mismatched_mtu_with_conntrack" ovs-vsctl add-br br0 || return 1
> +
> + # setup the client
> + info "setup client namespace"
> + ip link add c0 type veth peer name c1 || return 1
> + on_exit "ip link del c0 >/dev/null 2>&1"
> +
> + ip link set c0 mtu 1450
> + ip link set c0 up
> +
> + ip link set c1 netns client || return 1
> + ip netns exec client ip addr add 172.31.110.2/24 dev c1
> + ip netns exec client ip link set c1 mtu 1450
> + ip netns exec client ip link set c1 up
> + ovs_sbx "test_mismatched_mtu_with_conntrack" ovs-vsctl add-port br0 c0 || return 1
> +
> + # setup the server
> + info "setup server namespace"
> + ip link add s0 type veth peer name s1 || return 1
> + on_exit "ip link del s0 >/dev/null 2>&1; ip netns exec server ip link del s0 >/dev/null 2>&1"
> + ip link set s0 up
> +
> + ip link set s1 netns server || return 1
> + ip netns exec server ip addr add 172.31.110.1/24 dev s1 || return 1
> + ip netns exec server ip link set s1 up
> + ovs_sbx "test_mismatched_mtu_with_conntrack" ovs-vsctl add-port br0 s0 || return 1
> +
> + info "setup flows"
> + ovs_sbx "test_mismatched_mtu_with_conntrack" ovs-ofctl del-flows br0
> +
> + cat >${ovs_dir}/flows.txt <<EOF
> +table=0,priority=100,arp action=normal
> +table=0,priority=100,ip,udp action=ct(table=1)
> +table=0,priority=10 action=drop
> +
> +table=1,priority=100,ct_state=+new+trk,in_port=c0,ip action=ct(commit),s0
> +table=1,priority=100,ct_state=+est+trk,in_port=s0,ip action=c0
> +
> +EOF
> + ovs_sbx "test_mismatched_mtu_with_conntrack" ovs-ofctl --bundle add-flows br0 ${ovs_dir}/flows.txt || return 1
> + ovs_sbx "test_mismatched_mtu_with_conntrack" ovs-ofctl dump-flows br0
> +
> + ovs_sbx "test_mismatched_mtu_with_conntrack" ovs-vsctl show
> +
> + # setup echo
> + mknod -m 777 ${ovs_dir}/fifo p || return 1
> + # on_exit "rm ${ovs_dir}/fifo"
> +
> + # test a udp connection
> + info "send udp data"
> + ip netns exec server sh -c 'cat ${ovs_dir}/fifo | nc -l -vv -u 8888 >${ovs_dir}/fifo 2>${ovs_dir}/s1-nc.log & echo $! > ${ovs_dir}/server.pid'
> + on_exit "test -e \"${ovs_dir}/server.pid\" && kill \`cat \"${ovs_dir}/server.pid\"\`"
> + if [ $TRACING = 1 ]; then
> + ip netns exec server sh -c "tcpdump -i s1 -l -n -U -xx >> ${ovs_dir}/s1-pkts.cap &"
> + ip netns exec client sh -c "tcpdump -i c1 -l -n -U -xx >> ${ovs_dir}/c1-pkts.cap &"
> + fi
> +
> + ip netns exec client sh -c "python -c \"import time; print('a' * 1900); time.sleep(2)\" | nc -v -u 172.31.110.1 8888 2>${ovs_dir}/c1-nc.log" || return 1
> +
> + ovs_sbx "test_mismatched_mtu_with_conntrack" ovs-appctl dpctl/dump-flows
> + ovs_sbx "test_mismatched_mtu_with_conntrack" ovs-ofctl dump-flows br0
> +
> + info "check udp data was tx and rx"
> + grep "1901 bytes received" ${ovs_dir}/c1-nc.log || return 1
> + ovs_normal_exit
> +}
> +
> +run_test() {
> + (
> + tname="$1"
> + tdesc="$2"
> +
> + if ! lsmod | grep openvswitch >/dev/null 2>&1; then
> + printf "TEST: %-60s [SKIP]\n" "${tdesc}"
> + return $ksft_skip
> + fi
> +
> + unset IFS
> +
> + eval test_${tname}
> + ret=$?
> +
> + if [ $ret -eq 0 ]; then
> + printf "TEST: %-60s [ OK ]\n" "${tdesc}"
> + ovs_normal_exit
> + elif [ $ret -eq 1 ]; then
> + printf "TEST: %-60s [FAIL]\n" "${tdesc}"
> + if [ "${PAUSE_ON_FAIL}" = "yes" ]; then
> + echo
> + echo "Pausing. Hit enter to continue"
> + read a
> + fi
> + ovs_exit_sig
> + exit 1
> + elif [ $ret -eq $ksft_skip ]; then
> + printf "TEST: %-60s [SKIP]\n" "${tdesc}"
> + elif [ $ret -eq 2 ]; then
> + rm -rf test_${tname}
> + run_test "$1" "$2"
> + fi
> +
> + return $ret
> + )
> + ret=$?
> + case $ret in
> + 0)
> + all_skipped=false
> + [ $exitcode=$ksft_skip ] && exitcode=0
> + ;;
> + $ksft_skip)
> + [ $all_skipped = true ] && exitcode=$ksft_skip
> + ;;
> + *)
> + all_skipped=false
> + exitcode=1
> + ;;
> + esac
> +
> + return $ret
> +}
> +
> +
> +exitcode=0
> +desc=0
> +all_skipped=true
> +
> +while getopts :pvt o
> +do
> + case $o in
> + p) PAUSE_ON_FAIL=yes;;
> + v) VERBOSE=1;;
> + t) if which tcpdump > /dev/null 2>&1; then
> + TRACING=1
> + else
> + echo "=== tcpdump not available, tracing disabled"
> + fi
> + ;;
> + *) usage;;
> + esac
> +done
> +shift $(($OPTIND-1))
> +
> +IFS="
> +"
> +
> +for arg do
> + # Check first that all requested tests are available before running any
> + command -v > /dev/null "test_${arg}" || { echo "=== Test ${arg} not found"; usage; }
> +done
> +
> +name=""
> +desc=""
> +for t in ${tests}; do
> + [ "${name}" = "" ] && name="${t}" && continue
> + [ "${desc}" = "" ] && desc="${t}"
> +
> + run_this=1
> + for arg do
> + [ "${arg}" != "${arg#--*}" ] && continue
> + [ "${arg}" = "${name}" ] && run_this=1 && break
> + run_this=0
> + done
> + if [ $run_this -eq 1 ]; then
> + run_test "${name}" "${desc}"
> + fi
> + name=""
> + desc=""
> +done
> +
> +exit ${exitcode}
> --
> 2.25.4
>
--
fbl
Powered by blists - more mailing lists