netdev - Re: [RFC net-next 1/2] selftests: drv-net: add ability to schedule cleanup with defer()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <878qys9cqt.fsf@nvidia.com>
Date: Wed, 26 Jun 2024 12:18:58 +0200
From: Petr Machata <petrm@...dia.com>
To: Jakub Kicinski <kuba@...nel.org>
CC: <davem@...emloft.net>, <netdev@...r.kernel.org>, <edumazet@...gle.com>,
	<pabeni@...hat.com>, <willemdebruijn.kernel@...il.com>,
	<przemyslaw.kitszel@...el.com>, <leitao@...ian.org>, <petrm@...dia.com>
Subject: Re: [RFC net-next 1/2] selftests: drv-net: add ability to schedule
 cleanup with defer()


Jakub Kicinski <kuba@...nel.org> writes:

> This implements what I was describing in [1]. When writing a test
> author can schedule cleanup / undo actions right after the creation
> completes, eg:
>
>   cmd("touch /tmp/file")
>   defer(cmd, "rm /tmp/file")
>
> defer() takes the function name as first argument, and the rest are
> arguments for that function. defer()red functions are called in
> inverse order after test exits. It's also possible to capture them
> and execute earlier (in which case they get automatically de-queued).
>
>   undo = defer(cmd, "rm /tmp/file")
>   # ... some unsafe code ...
>   undo.exec()
>
> As a nice safety all exceptions from defer()ed calls are captured,
> printed, and ignored (they do make the test fail, however).
> This addresses the common problem of exceptions in cleanup paths
> often being unhandled, leading to potential leaks.
>
> There is a global action queue, flushed by ksft_run(). We could support
> function level defers too, I guess, but there's no immediate need..
>
> Link: https://lore.kernel.org/all/877cedb2ki.fsf@nvidia.com/ # [1]
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> ---
>  tools/testing/selftests/net/lib/py/ksft.py  | 49 +++++++++++++++------
>  tools/testing/selftests/net/lib/py/utils.py | 41 +++++++++++++++++
>  2 files changed, 76 insertions(+), 14 deletions(-)
>
> diff --git a/tools/testing/selftests/net/lib/py/ksft.py b/tools/testing/selftests/net/lib/py/ksft.py
> index 45ffe277d94a..4a72b9cbb27d 100644
> --- a/tools/testing/selftests/net/lib/py/ksft.py
> +++ b/tools/testing/selftests/net/lib/py/ksft.py
> @@ -6,6 +6,7 @@ import sys
>  import time
>  import traceback
>  from .consts import KSFT_MAIN_NAME
> +from .utils import global_defer_queue
>  
>  KSFT_RESULT = None
>  KSFT_RESULT_ALL = True
> @@ -108,6 +109,24 @@ KSFT_RESULT_ALL = True
>      print(res)
>  
>  
> +def ksft_flush_defer():
> +    global KSFT_RESULT
> +
> +    while global_defer_queue:
> +        entry = global_defer_queue[-1]
> +        try:
> +            entry.exec()

I wonder if you added _exec() to invoke it here. Because then you could
just do entry = global_defer_queue.pop() and entry._exec(), and in the
except branch you would just have the test-related business, without the
queue management.

> +        except Exception:

I think this should be either an unqualified except: or except
BaseException:.

> +            if global_defer_queue and global_defer_queue[-1] == entry:
> +                global_defer_queue.pop()
> +
> +            ksft_pr("Exception while handling defer / cleanup!")

Hmm, I was thinking about adding defer.__str__ and using it here to
give more clue as to where it went wrong, but the traceback is IMHO
plenty good enough.

> +            tb = traceback.format_exc()
> +            for line in tb.strip().split('\n'):
> +                ksft_pr("Defer Exception|", line)
> +            KSFT_RESULT = False
> +
> +
>  def ksft_run(cases=None, globs=None, case_pfx=None, args=()):
>      cases = cases or []
>  
> @@ -130,29 +149,31 @@ KSFT_RESULT_ALL = True
>      for case in cases:
>          KSFT_RESULT = True
>          cnt += 1
> +        comment = ""
> +        cnt_key = ""
> +
>          try:
>              case(*args)
>          except KsftSkipEx as e:
> -            ktap_result(True, cnt, case, comment="SKIP " + str(e))
> -            totals['skip'] += 1
> -            continue
> +            comment = "SKIP " + str(e)
> +            cnt_key = 'skip'
>          except KsftXfailEx as e:
> -            ktap_result(True, cnt, case, comment="XFAIL " + str(e))
> -            totals['xfail'] += 1
> -            continue
> +            comment = "XFAIL " + str(e)
> +            cnt_key = 'xfail'
>          except Exception as e:
>              tb = traceback.format_exc()
>              for line in tb.strip().split('\n'):
>                  ksft_pr("Exception|", line)
> -            ktap_result(False, cnt, case)
> -            totals['fail'] += 1
> -            continue
> +            KSFT_RESULT = False
> +            cnt_key = 'fail'
>  
> -        ktap_result(KSFT_RESULT, cnt, case)
> -        if KSFT_RESULT:
> -            totals['pass'] += 1
> -        else:
> -            totals['fail'] += 1
> +        ksft_flush_defer()
> +
> +        if not cnt_key:
> +            cnt_key = 'pass' if KSFT_RESULT else 'fail'
> +
> +        ktap_result(KSFT_RESULT, cnt, case, comment=comment)
> +        totals[cnt_key] += 1
>  
>      print(
>          f"# Totals: pass:{totals['pass']} fail:{totals['fail']} xfail:{totals['xfail']} xpass:0 skip:{totals['skip']} error:0"

Majority of this hunk is just preparatory and should be in a patch of
its own. Then in this patch it should just introduce the flush.

> diff --git a/tools/testing/selftests/net/lib/py/utils.py b/tools/testing/selftests/net/lib/py/utils.py
> index 405aa510aaf2..1ef6ebaa369e 100644
> --- a/tools/testing/selftests/net/lib/py/utils.py
> +++ b/tools/testing/selftests/net/lib/py/utils.py
> @@ -66,6 +66,47 @@ import time
>          return self.process(terminate=self.terminate, fail=self.check_fail)
>  
>  
> +global_defer_queue = []
> +
> +
> +class defer:
> +    def __init__(self, func, *args, **kwargs):
> +        global global_defer_queue
> +        if global_defer_queue is None:
> +            raise Exception("defer environment has not been set up")
> +
> +        if not callable(func):
> +            raise Exception("defer created with un-callable object, did you call the function instead of passing its name?")
> +
> +        self.func = func
> +        self.args = args
> +        self.kwargs = kwargs
> +
> +        self.queued = True
> +        self.executed = False
> +
> +        self._queue =  global_defer_queue
> +        self._queue.append(self)
> +
> +    def __enter__(self):
> +        return self
> +
> +    def __exit__(self, ex_type, ex_value, ex_tb):
> +        return self.exec()
> +
> +    def _exec(self):
> +        self.func(*self.args, **self.kwargs)
> +
> +    def cancel(self):

This shouldn't dequeue if not self.queued.

> +        self._queue.remove(self)
> +        self.queued = False
> +
> +    def exec(self):

This shouldn't exec if self.executed.

But I actually wonder if we need two flags at all. Whether the defer
entry is resolved through exec(), cancel() or __exit__(), it's "done".
It could be left in the queue, in which case the "done" flag is going to
disable future exec requests. Or it can just be dropped from the queue
when done, in which case we don't even need the "done" flag as such.

> +        self._exec()
> +        self.cancel()
> +        self.executed = True
> +
> +
>  def tool(name, args, json=None, ns=None, host=None):
>      cmd_str = name + ' '
>      if json: