linux-ext4 - Re: [RFC PATCH v2 0/4] make jbd2 debug switch per device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YA89Ov+yuX6BHJpS@mit.edu>
Date:   Mon, 25 Jan 2021 16:50:50 -0500
From:   "Theodore Ts'o" <tytso@....edu>
To:     Chunguang Xu <brookxu.cn@...il.com>
Cc:     adilger.kernel@...ger.ca, jack@...e.com,
        harshadshirwadkar@...il.com, linux-ext4@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2 0/4] make jbd2 debug switch per device

On Sat, Jan 23, 2021 at 08:00:42PM +0800, Chunguang Xu wrote:
> On a multi-disk machine, because jbd2 debugging switch is global, this
> confuses the logs of multiple disks. It is not easy to distinguish the
> logs of each disk and the amount of generated logs is very large. Maybe
> a separate debugging switch for each disk would be better, so that we
> can easily distinguish the logs of a certain disk. 
> 
> We can enable jbd2 debugging of a device in the following ways:
> echo X > /proc/fs/jbd2/sdX/jbd2_debug
> 
> But there is a small disadvantage here. Because the debugging switch is
> placed in the journal_t object, the log before the object is initialized
> will be lost. However, usually this will not have much impact on
> debugging.

The jbd debugging infrastructure dates back to the very beginnings of
ext3, when Stephen Tweedie added them while he was first implementing
the jbd layer.  So this dates back to a time before we had other
schemes like dynamic debug or tracepoints or eBPF.

I wonder if instead of trying to enhance our own bespoke debugging
system, instead we set up something like tracepoints where they would
be useful.  I'm not proposing that we try to replace all jbd_debug()
statements with tracepoints but I think it would be useful to look at
what sort of information would actually be *useful* on a production
server, and add those tracepoints to the jbd2 layer.  What I like
about tracepoints is you can enable them on a much more fine-grained
fashion; information is sent to userspace in a much more efficient
manner than printk; you can filter tracepoint events in the kernel,
before sending them to userspace; and if you want more sophisticated
filtering or aggregation, you can use eBPF.

What was the original use case which inspired this?  Were you indeed
trying to debug some kind of problem on a production system?  (Why did
you have multiple disks active at the same time?)  Was there a
specific problem you were trying to debug?  What debug level were you
using?  Which jbd_debug statements were most useful to you?  Which
just got in the way (but which had to be enabled given the log level
you needed to get the debug messages that you needed)?

    	      	      	    	     	      - Ted