lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <639d4bb71bada_b41e329452@dwillia2-xfh.jf.intel.com.notmuch>
Date:   Fri, 16 Dec 2022 20:55:19 -0800
From:   Dan Williams <dan.j.williams@...el.com>
To:     Luis Chamberlain <mcgrof@...nel.org>, <alison.schofield@...el.com>,
        <vishal.l.verma@...el.com>, <ira.weiny@...el.com>,
        <bwidawsk@...nel.org>, <dan.j.williams@...el.com>
CC:     <dave@...olabs.net>, <a.manzanares@...sung.com>,
        <mcgrof@...nel.org>, <linux-cxl@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>
Subject: RE: [RFC] cxl_test: upgrade as a first class citizen selftests
 capable driver

Luis Chamberlain wrote:
> To unit test CXL today we make use of the ndctl meson test suite but
> this requires the cxl_test driver. There is however is no kconfig option
> for this driver as its required to be built as an external driver. To
> debug CXL with this framework is not inuitive as it departs itself from
> typical Linux kernel development debugging processes, requiring an
> external module build which also happens to *rebuild* all CXL related
> production drivers with some new magic incantations, and replacing
> your production CXL modules with the new ones.
> 
> This is quite complex, departs ourselves from the typical build/boot
> debugging process most folks are used to, and requires a manual error-prone
> process which in some kernels / configurations can leads up to a kernel
> crash [0].
> 
> We can replace this by having the requirements be defined through proper
> kconfig symbols and having the cxl_test driver and requirements also become
> part of the standard Linux kernel build process. This matches most other
> kernel kernel debugging frameworks for subsystems, which don't require any
> external modules.
> 
> Let's review the current strategy today, first, so nothing is lost:
> 
>   * one must manually *build*, and then as a second step install
>     the cxl_test driver as an external modules:
> 
>     make M=tools/testing/cxl/
>     sudo M=tools/testing/cxl/ modules_install
> 
>     Provided your depmod.d was configured correctly on your Linux
>     distribution you will end up with a complete set of CXL production
>     modules and cxl_test mock drivers to let you now use the ndctl
>     test suite. To be clear, you will not only end up with cxl_test
>     but also with a complete set of module replacements for your CXL
>     environment.
> 
>     This works by:
> 
>     a) allowing the external module to re-define the __mock macro
>        to __weak, used on to_cxl_host_bridge() and allows the mock driver
>        to provide a replacement for that single call.
> 
>     b) the external module build process *rebuilds* all production
>        modules *again* but uses the the binutils --wrap=symbol
>        feature [0] [1] to let the production CXL code use the mocked up
>        CXL features.
> 
> We can simplify all this considerably and do away with the external
> modules requirements. The __mock stuff is raplaced by addressing the
> to_cxl_host_bridge() mapping using a define based on your kernel
> configuration. If using the production code you use the produciton
> __to_cxl_host_bridge(), otherwise mock_to_cxl_host_bridge() will be
> used. This is the *only* eyesore in the CXL code to enable use of the
> mock driver.
> 
> The magic --wrap=symbol incantations are also just tucked in a new
> production drivers/cxl/Makefile.mock which is only read when the kernel
> has been configured for debugging using the CXl mock framework.
> 
> The last bit of work left is to move as built-in code shared code
> between a production environment (non-debugging) and between what is
> needed for the same code to run when doing mock debugging. Today the
> requirements are small:
> 
>   * The code to implement to_cxl_host_bridge()
>   * When mock debugging is enabled, just the code we need to
>     support mock_to_cxl_host_bridge()
> 
> For both cases this is needed you have CXL_ACPI enabled.
> 
> In the future if we wanted to then now use the kernel selftests,
> for example a tools/testing/sefltests/cxl/ directory, we can easily
> do so. This also enables us to separate out unit tests out from the
> ndctl tree and allow unit tests to also be developed and written
> upstream on the kernel.
> 
> Another benefit of this approach is that there is no bit rot,
> in the sense that now bots can go willy nilly test building this
> code, whereas before only those who knew the proper incantations
> actually were building this code and loading it properly.
> 
> [0] https://lkml.kernel.org/r/20221209062919.1096779-1-mcgrof@kernel.org
> [1] https://sourceware.org/binutils/docs-2.23.1/ld/Options.html#index-g_t_002d_002dwrap_003d_0040var_007bsymbol_007d-263
> [2] https://lwn.net/Articles/558106/
> Signed-off-by: Luis Chamberlain <mcgrof@...nel.org>
> ---
> 
> What do folks think?
> 
> The test results:
> 
> https://gist.github.com/mcgrof/2ab7f1601141faa5ac7b16240b4ea652
> 
> The summary of test results:
> 
> Summary of Failures:
> 
> 1/5 ndctl:cxl / cxl-topology.sh      FAIL             0.50s   exit status 1
> 2/5 ndctl:cxl / cxl-region-sysfs.sh  FAIL             0.44s   exit status 1
> 5/5 ndctl:cxl / cxl-xor-region.sh    FAIL             0.45s   exit status 1

At least for me the presented rationale and these results leaves me
cold. Yes, there are sharp edges but 0day groks this scheme when it runs
tools/testing/nvdimm/ (nfit_test) tests. The run_qemu script automates
cxl_test and nfit_test this as well. So the complexity has not proven
prohibitive. In the case of tools/testing/nvdimm/ it has not proven
prohibitive for years. In other words the suggestion that the current
organization ultimately leads to bit rot has not been substantiated in
practice.

The proposed direction to move tests out of the ndctl.git repo into the
kernel solves the wrong problem. It overlooks the fact that the tests
are more tightly coupled to libcxl changes than kernel changes. So in
terms of benefits of code being colocated, tests + libcxl + tools in the
same repo is more impactful than tests + kernel in the same repo.

I know Jonathan has some latent ideas about building up a CXL regression
suite on top of QEMU, but even there it's not clear that would benefit
from being developed in linux.git given the tight coupling to QEMU.git.

> 
> Ok:                 2   
> Expected Fail:      0   
> Fail:               3   
> Unexpected Pass:    0   
> Skipped:            0   
> Timeout:            0  
> 
> I don't quite get the failures yet, but hey it's a start.
> This commit depends on Dan's patch:
> 
> https://lkmll.kernel.org/r/6393a3a9d2882_579c1294b3@dwillia2-xfh.jf.intel.com.notmuch
> 
> But I can build another RFC if folks want without it.
> 
>  drivers/cxl/Kconfig                           | 23 +++++++
>  drivers/cxl/Makefile                          |  2 +
>  drivers/cxl/Makefile.mock                     | 15 +++++
>  drivers/cxl/acpi.c                            | 13 ----
>  drivers/cxl/core/Makefile                     |  4 ++
>  drivers/cxl/core/acpi.c                       | 30 +++++++++
>  drivers/cxl/cxl.h                             | 19 +++---
>  lib/Kconfig.debug                             |  8 +++
>  lib/Makefile                                  |  1 +
>  lib/test_cxl/Makefile                         | 13 ++++
>  lib/test_cxl/acpi.c                           | 28 +++++++++
>  lib/test_cxl/core.c                           | 37 +++++++++++
>  .../testing/cxl/test => lib/test_cxl}/cxl.c   |  7 ---
>  .../testing/cxl/test => lib/test_cxl}/mem.c   |  0
>  .../testing/cxl/test => lib/test_cxl}/mock.c  | 30 ---------
>  .../testing/cxl/test => lib/test_cxl}/mock.h  |  0
 
I do not think cxl_test is suitable to ship as an in-tree capability, it
is intentionally second class to keep test logic distinct from
production code.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ