[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e5c4eb9033b93b06c1f7a17ecc79d8dd766bf86f.camel@ibm.com>
Date: Tue, 28 Oct 2025 17:07:30 +0000
From: Viacheslav Dubeyko <Slava.Dubeyko@....com>
To: "linux-mm@...ck.org" <linux-mm@...ck.org>,
Alex Markuze
<amarkuze@...hat.com>,
"ceph-devel@...r.kernel.org"
<ceph-devel@...r.kernel.org>,
"linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>
CC: "dietmar.eggemann@....com" <dietmar.eggemann@....com>,
"rppt@...nel.org"
<rppt@...nel.org>,
"lorenzo.stoakes@...cle.com" <lorenzo.stoakes@...cle.com>,
Xiubo Li <xiubli@...hat.com>,
"idryomov@...il.com" <idryomov@...il.com>,
"david@...hat.com" <david@...hat.com>,
"mgorman@...e.de" <mgorman@...e.de>, "vbabka@...e.cz" <vbabka@...e.cz>,
"vincent.guittot@...aro.org"
<vincent.guittot@...aro.org>,
"akpm@...ux-foundation.org"
<akpm@...ux-foundation.org>,
"Liam.Howlett@...cle.com"
<Liam.Howlett@...cle.com>,
Ingo Molnar <mingo@...hat.com>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"surenb@...gle.com"
<surenb@...gle.com>,
Valentin Schneider <vschneid@...hat.com>,
"kees@...nel.org" <kees@...nel.org>,
"peterz@...radead.org"
<peterz@...radead.org>,
"mhocko@...e.com" <mhocko@...e.com>,
"bsegall@...gle.com" <bsegall@...gle.com>,
"juri.lelli@...hat.com"
<juri.lelli@...hat.com>
Subject: Re: [RFC PATCH 0/5] BLOG: per-task logging contexts with Ceph
consumer
On Fri, 2025-10-24 at 08:42 +0000, Alex Markuze wrote:
Probably, it make sense to consider it as a topic for LSF/MM/BPF conference.
Because, it could be not easy to convince people.
As far as I can see, from my point of view, the motivation doesn't contain
enough explanation of benefits, benchmarking results and comparison with already
existing infrastructures. The clear explanation of these points could be a good
step to convince people to try and to adopt the new infrastructure.
> Motivation: improve observability in production by providing subsystemsawith
"subsystemsawith" -> subsystem with?
> a logger that keeps up with their verbouse unstructured logs and aggregating
> logs at the process context level, akin to userspace TLS.
>
> Binary LOGging (BLOG) introduces a task-local logging context: each context
> owns a single 512 KiB fragment that cycles through “ready → in use → queued for
Why exactly 512 KiB? Could it be increased/decreased? Any available optimization
parameters of infrastructure?
Could the infrastructure "eat" the whole memory if we have a lot tasks/cores? Do
we have any danger of introducing the system crashes because of BLOG subsystem's
memory requirements?
I assume that BLOG's 512 KiB fragment works as a circular buffer. Am I right
here? So, how long could be the recorded history of operations? Could new
records overwrite the information that needs for the issue analysis?
> readers → reset → ready” without re-entering the allocator. Writers copy the
> raw parameters they already have; readers format them later when the log is
> inspected.
>
> BLOG borrows ideas from ftrace (captureabinary data now, format later) but
"captureabinary" -> capture a binary?
> unlike ftrace there is no global ring. Each module registers its own logger,
> manages its own buffers, and keeps the state small enough for production use.
>
> To host the per-module pointers we extend `struct task_struct` with one
> additional `void *`, in line with other task extensions already in the kernel.
> Each module keeps independent batches: `alloc_batch` for contexts with
> refcount 0 and `log_batch` for contexts that have been filled and are waiting
> for readers. The batching layer and buffer management were migrated from the
> existing Ceph SAN logging code, so the behaviour is battle-tested; we simply
I am not completely following what do you mean by Ceph SAN logging code. Maybe,
it makes to share the link on it?
> made the buffer inline so every composite stays within a single 512 KiB
> allocation.
>
> The patch series lands the BLOG library first, then wires the task lifecycle,
> and finally switches Ceph’s `bout*` logging macros to BLOG so we exercise the
What do you mean by Ceph’s `bout*` logging macros? Do you mean 'dout*' here?
Thanks,
Slava.
> new path.
>
> Patch summary:
> 1. sched, fork: wire BLOG contexts into task lifecycle
> - Adds `struct blog_tls_ctx *blog_contexts[BLOG_MAX_MODULES]` to
> `struct task_struct`.
> - Fork/exit paths initialise and recycle contexts automatically.
>
> 2. lib: introduce BLOG (Binary LOGging) subsystem
> - Adds `lib/blog/` sources and headers under `include/linux/blog/`.
> - Each composite (`struct blog_tls_pagefrag`) consists of the TLS
> metadata, the pagefrag state, and an inline buffer sized at
> `BLOG_PAGEFRAG_SIZE - sizeof(struct blog_tls_pagefrag)`.
>
> 3. ceph: add BLOG scaffolding
> - Introduces `include/linux/ceph/ceph_blog.h` and `fs/ceph/blog_client.c`.
> - Ceph registers a logger and maintains a client-ID map for the reader
> callback.
>
> 4. ceph: add BLOG debugfs support
> - Adds `fs/ceph/blog_debugfs.c` so filled contexts can be drained.
>
> 5. ceph: activate BLOG logging
> - Switches `bout*` macros to BLOG, making Ceph the first consumer.
>
> With these patches, Ceph now writes its verbose logging to task-local buffers
> managed by BLOG, and the infrastructure is ready for other subsystems that need
> allocation-free, module-owned logging.
>
> Alex Markuze (5):
> sched, fork: Wire BLOG contexts into task lifecycle
> lib: Introduce BLOG (Binary LOGging) subsystem
> ceph: Add BLOG scaffolding
> ceph: Add BLOG debugfs support
> ceph: Activate BLOG logging
>
> fs/ceph/Makefile | 2 +
> fs/ceph/addr.c | 130 ++---
> fs/ceph/blog_client.c | 244 +++++++++
> fs/ceph/blog_debugfs.c | 361 +++++++++++++
> fs/ceph/caps.c | 242 ++++-----
> fs/ceph/crypto.c | 18 +-
> fs/ceph/debugfs.c | 33 +-
> fs/ceph/dir.c | 88 ++--
> fs/ceph/export.c | 20 +-
> fs/ceph/file.c | 130 ++---
> fs/ceph/inode.c | 182 +++----
> fs/ceph/ioctl.c | 6 +-
> fs/ceph/locks.c | 22 +-
> fs/ceph/mds_client.c | 278 +++++-----
> fs/ceph/mdsmap.c | 8 +-
> fs/ceph/quota.c | 2 +-
> fs/ceph/snap.c | 66 +--
> fs/ceph/super.c | 82 +--
> fs/ceph/xattr.c | 42 +-
> include/linux/blog/blog.h | 515 +++++++++++++++++++
> include/linux/blog/blog_batch.h | 54 ++
> include/linux/blog/blog_des.h | 46 ++
> include/linux/blog/blog_module.h | 329 ++++++++++++
> include/linux/blog/blog_pagefrag.h | 33 ++
> include/linux/blog/blog_ser.h | 275 ++++++++++
> include/linux/ceph/ceph_blog.h | 124 +++++
> include/linux/ceph/ceph_debug.h | 6 +-
> include/linux/sched.h | 7 +
> kernel/fork.c | 37 ++
> lib/Kconfig | 2 +
> lib/Makefile | 2 +
> lib/blog/Kconfig | 56 +++
> lib/blog/Makefile | 15 +
> lib/blog/blog_batch.c | 311 ++++++++++++
> lib/blog/blog_core.c | 772 ++++++++++++++++++++++++++++
> lib/blog/blog_des.c | 385 ++++++++++++++
> lib/blog/blog_module.c | 781 +++++++++++++++++++++++++++++
> lib/blog/blog_pagefrag.c | 124 +++++
> 38 files changed, 5163 insertions(+), 667 deletions(-)
> create mode 100644 fs/ceph/blog_client.c
> create mode 100644 fs/ceph/blog_debugfs.c
> create mode 100644 include/linux/blog/blog.h
> create mode 100644 include/linux/blog/blog_batch.h
> create mode 100644 include/linux/blog/blog_des.h
> create mode 100644 include/linux/blog/blog_module.h
> create mode 100644 include/linux/blog/blog_pagefrag.h
> create mode 100644 include/linux/blog/blog_ser.h
> create mode 100644 include/linux/ceph/ceph_blog.h
> create mode 100644 lib/blog/Kconfig
> create mode 100644 lib/blog/Makefile
> create mode 100644 lib/blog/blog_batch.c
> create mode 100644 lib/blog/blog_core.c
> create mode 100644 lib/blog/blog_des.c
> create mode 100644 lib/blog/blog_module.c
> create mode 100644 lib/blog/blog_pagefrag.c
Powered by blists - more mailing lists