[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240814143414.1877505-1-aahringo@redhat.com>
Date: Wed, 14 Aug 2024 10:34:02 -0400
From: Alexander Aring <aahringo@...hat.com>
To: teigland@...hat.com
Cc: gfs2@...ts.linux.dev,
song@...nel.org,
yukuai3@...wei.com,
agruenba@...hat.com,
mark@...heh.com,
jlbec@...lplan.org,
joseph.qi@...ux.alibaba.com,
gregkh@...uxfoundation.org,
rafael@...nel.org,
akpm@...ux-foundation.org,
linux-kernel@...r.kernel.org,
linux-raid@...r.kernel.org,
ocfs2-devel@...ts.linux.dev,
lucien.xin@...il.com,
aahringo@...hat.com
Subject: [RFC dlm/next 00/12] dlm: net-namespace functionality
Hi,
this patch-series is huge but brings a lot of basic "fun" net-namespace
functionality to DLM. Currently you need a couple of Linux kernel
instances running in e.g. Virtual Machines. With this patch-series I
want to break out of this virtual machine world dealing with multiple
kernels need to boot them all individually, etc. Now you can use DLM in
only one Linux kernel instance and each "node" (previously represented
by a virtual machine) is separate by a net-namespace. Why
net-namespaces? It just fits to the DLM design for now, you need to have
them anyway because the internal DLM socket handling on a per node
basis. What we do additionally is to separate the DLM lockspaces (the
lockspace that is being registered) by net-namespaces as this represents
a "network entity" (node). There might be reasons to introduce a
complete new kind of namespaces (locking namespace?) but I don't want to
do this step now and as I said net-namespaces are required anyway for
the DLM sockets.
You need some new user space tooling as a new netlink net-namespace aware
UAPI is introduced (but can co-exist with configfs that operates on
init_net only). See [0] for more steps, there is a copr repo for the
new tooling and can be enabled by:
$ dnf copr enable aring/nldlm
$ dnf install nldlm
or compile it yourself.
Then there is currently a very simple script [1] to show a 3 nodes cluster
using gfs2 on a multiple loop block devices on a shared loop block device
image (sounds weird but I do something like that). There are currently
some user space synchronization issues that I solve by simple sleeps, but
they are only user space problems.
To test it I recommend some virtual machine "but only one" and run the
[1] script. Afterwards you have in your executed net-namespace the 3
mountpoints /cluster/node1, /cluster/node2/ and /cluster/node3. Any vfs
operations on those mountpoints acts as a per node entity operation.
We can use it for testing, development and also scale testing to have a
large number of nodes joining a lockspace (which seems to be a problem
right now). Instead of running 1000 vms, we can run 1000 net-namespaces
in a more resource limited environment. For me it seems gfs2 can handle
several mounts and still separate the resource according their global
variables. Their data structures e.g. glock hash seems to have in their
key a separation for that (fsid?). However this is still an experimental
feature we might run into issues that requires more separation related
to net-namespaces. However basic testing seems to run just fine.
Limitations
I disable any functionality for the DLM character device that allow
plock handling or do DLM locking from user space. Just don't use any
plock locking in gfs2 for now. But basic vfs operations should work. You
can even sniff DLM traffic on the created "dlmsw" virtual bridge.
- Alex
[0] https://gitlab.com/netcoder/nldlm
[1] https://gitlab.com/netcoder/gfs2ns-examples/-/blob/main/three_nodes
Alexander Aring (12):
dlm: introduce dlm_find_lockspace_name()
dlm: disallow different configs nodeid storages
dlm: add struct net to dlm_new_lockspace()
dlm: handle port as __be16 network byte order
dlm: use dlm_config as only cluster configuration
dlm: dlm_config_info config fields to unsigned int
dlm: rename config to configfs
kobject: add kset_type_create_and_add() helper
kobject: export generic helper ops
dlm: separate dlm lockspaces per net-namespace
dlm: add nldlm net-namespace aware UAPI
gfs2: separate mount context by net-namespaces
drivers/md/md-cluster.c | 3 +-
fs/dlm/Makefile | 2 +
fs/dlm/config.c | 1291 +++++++++++++++----------------------
fs/dlm/config.h | 215 +++++--
fs/dlm/configfs.c | 882 ++++++++++++++++++++++++++
fs/dlm/configfs.h | 19 +
fs/dlm/debug_fs.c | 24 +-
fs/dlm/dir.c | 4 +-
fs/dlm/dlm_internal.h | 24 +-
fs/dlm/lock.c | 64 +-
fs/dlm/lock.h | 3 +-
fs/dlm/lockspace.c | 220 ++++---
fs/dlm/lockspace.h | 12 +-
fs/dlm/lowcomms.c | 525 ++++++++--------
fs/dlm/lowcomms.h | 29 +-
fs/dlm/main.c | 5 -
fs/dlm/member.c | 36 +-
fs/dlm/midcomms.c | 287 ++++-----
fs/dlm/midcomms.h | 31 +-
fs/dlm/nldlm.c | 1330 +++++++++++++++++++++++++++++++++++++++
fs/dlm/nldlm.h | 176 ++++++
fs/dlm/plock.c | 2 +-
fs/dlm/rcom.c | 16 +-
fs/dlm/rcom.h | 3 +-
fs/dlm/recover.c | 17 +-
fs/dlm/user.c | 63 +-
fs/dlm/user.h | 2 +-
fs/gfs2/glock.c | 8 +
fs/gfs2/incore.h | 2 +
fs/gfs2/lock_dlm.c | 6 +-
fs/gfs2/ops_fstype.c | 5 +
fs/gfs2/sys.c | 27 +-
fs/ocfs2/stack_user.c | 2 +-
include/linux/dlm.h | 9 +-
include/linux/kobject.h | 10 +-
lib/kobject.c | 55 +-
36 files changed, 3941 insertions(+), 1468 deletions(-)
create mode 100644 fs/dlm/configfs.c
create mode 100644 fs/dlm/configfs.h
create mode 100644 fs/dlm/nldlm.c
create mode 100644 fs/dlm/nldlm.h
--
2.43.0
Powered by blists - more mailing lists