lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231117142335.9674-A-hca@linux.ibm.com>
Date:   Fri, 17 Nov 2023 15:23:35 +0100
From:   Heiko Carstens <hca@...ux.ibm.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Linux Trace Kernel <linux-trace-kernel@...r.kernel.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ajay Kaher <akaher@...are.com>, chinglinyu@...gle.com,
        lkp@...el.com, namit@...are.com, oe-lkp@...ts.linux.dev,
        amakhalov@...are.com, er.ajay.kaher@...il.com,
        srivatsa@...il.mit.edu, tkundu@...are.com, vsirnapalli@...are.com,
        linux-s390@...r.kernel.org
Subject: Re: [PATCH v5] eventfs: Remove eventfs_file and just use
 eventfs_inode

Hi Steven,

On Wed, Oct 04, 2023 at 04:50:07PM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@...dmis.org>
> 
> Instead of having a descriptor for every file represented in the eventfs
> directory, only have the directory itself represented. Change the API to
> send in a list of entries that represent all the files in the directory
> (but not other directories). The entry list contains a name and a callback
> function that will be used to create the files when they are accessed.
...
> Cc: Masami Hiramatsu <mhiramat@...nel.org>
> Cc: Mark Rutland <mark.rutland@....com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Ajay Kaher <akaher@...are.com>
> Signed-off-by: Steven Rostedt (Google) <rostedt@...dmis.org>
> ---
> Changes since v4: https://lore.kernel.org/linux-trace-kernel/20231003184059.4924468e@gandalf.local.home/
> 
>  - Get the ei->dentry within the eventfs_mutex to keep consistency during the lookup.
> 
>  fs/tracefs/event_inode.c     | 847 ++++++++++++++++++-----------------
>  fs/tracefs/inode.c           |   2 +-
>  fs/tracefs/internal.h        |  37 +-
>  include/linux/trace_events.h |   2 +-
>  include/linux/tracefs.h      |  29 +-
>  kernel/trace/trace.c         |   7 +-
>  kernel/trace/trace.h         |   4 +-
>  kernel/trace/trace_events.c  | 313 +++++++++----
>  8 files changed, 705 insertions(+), 536 deletions(-)

I think this patch causes from time to time crashes when running ftrace
selftests. In particular I guess there is a bug wrt error handling in this
function (see below for call trace):

> +static struct dentry *
> +create_file_dentry(struct eventfs_inode *ei, struct dentry **e_dentry,
> +		   struct dentry *parent, const char *name, umode_t mode, void *data,
> +		   const struct file_operations *fops, bool lookup)
> +{
> +	struct dentry *dentry;
> +	bool invalidate = false;
> +
> +	mutex_lock(&eventfs_mutex);
> +	/* If the e_dentry already has a dentry, use it */
> +	if (*e_dentry) {
> +		/* lookup does not need to up the ref count */
> +		if (!lookup)
> +			dget(*e_dentry);
> +		mutex_unlock(&eventfs_mutex);
> +		return *e_dentry;
> +	}
> +	mutex_unlock(&eventfs_mutex);
> +
> +	/* The lookup already has the parent->d_inode locked */
> +	if (!lookup)
> +		inode_lock(parent->d_inode);
> +
> +	dentry = create_file(name, mode, parent, data, fops);
> +
> +	if (!lookup)
> +		inode_unlock(parent->d_inode);
> +
> +	mutex_lock(&eventfs_mutex);
> +
> +	if (IS_ERR_OR_NULL(dentry)) {
> +		/*
> +		 * When the mutex was released, something else could have
> +		 * created the dentry for this e_dentry. In which case
> +		 * use that one.
> +		 *
> +		 * Note, with the mutex held, the e_dentry cannot have content
> +		 * and the ei->is_freed be true at the same time.
> +		 */
> +		WARN_ON_ONCE(ei->is_freed);
> +		dentry = *e_dentry;
> +		/* The lookup does not need to up the dentry refcount */
> +		if (dentry && !lookup)
> +			dget(dentry);
> +		mutex_unlock(&eventfs_mutex);
> +		return dentry;
> +	}
> +
> +	if (!*e_dentry && !ei->is_freed) {
> +		*e_dentry = dentry;
> +		dentry->d_fsdata = ei;
> +	} else {
> +		/*
> +		 * Should never happen unless we get here due to being freed.
> +		 * Otherwise it means two dentries exist with the same name.
> +		 */
> +		WARN_ON_ONCE(!ei->is_freed);
> +		invalidate = true;
> +	}
> +	mutex_unlock(&eventfs_mutex);
> +
> +	if (invalidate)
> +		d_invalidate(dentry);
> +
> +	if (lookup || invalidate)
> +		dput(dentry);
> +
> +	return invalidate ? NULL : dentry;
> +}

We sometimes see crashes like this:

specification exception: 0006 ilc:2 [#1] SMP 
CPU: 6 PID: 38815 Comm: ls Not tainted 6.7.0-20231116.rc1.git1.a7e756a5bb26.300.vr.fc38.s390x #1
Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
Krnl PSW : 0704c00180000000 000001682304bb00 (d_invalidate+0x30/0x110)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: ffffffffffffffff 000000e200000000 0000000000000047 000000e200000007
           0000000000000000 ffffff7c197bf000 000000e2f13b0b20 000000e25bfae180
           000000e2f2536000 ffffffffffffffef 0000000000000000 ffffffffffffffef
           000003ff95cacf98 000000e2f29323f0 000000e827c1fa18 000000e827c1f9d0
Krnl Code: 000001682304baf4: a7180000            lhi     %r1,0
           000001682304baf8: 583003ac            l       %r3,940
          #000001682304bafc: ba13b058            cs      %r1,%r3,88(%r11)
          >000001682304bb00: ec16006b007e        cij     %r1,0,6,000001682304bbd6
           000001682304bb06: e310b0100002        ltg     %r1,16(%r11)
           000001682304bb0c: a784004e            brc     8,000001682304bba8
           000001682304bb10: b904002b            lgr     %r2,%r11
           000001682304bb14: c0e5ffffe67e        brasl   %r14,0000016823048810
Call Trace:
 [<000001682304bb00>] d_invalidate+0x30/0x110 
 [<000001682329147a>] create_dir_dentry+0xe2/0x200 
 [<000001682329190a>] dcache_dir_open_wrapper+0x102/0x3e8 
 [<000001682301fb8a>] do_dentry_open+0x24a/0x568 
 [<0000016823038836>] do_open+0x2de/0x448 
 [<000001682303cb58>] path_openat+0x110/0x2b0 
 [<000001682303d688>] do_filp_open+0x90/0x130 
 [<0000016823022960>] do_sys_openat2+0xa8/0xd8 
 [<0000016823022b50>] do_sys_open+0x58/0x90 
 [<00000168239c9edc>] __do_syscall+0x1d4/0x200 
 [<00000168239db1f8>] system_call+0x70/0x98 
Last Breaking-Event-Address:
 [<0000016823291474>] create_dir_dentry+0xdc/0x200
Kernel panic - not syncing: Fatal exception: panic_on_oops

Note that the compare and swap instruction within d_invalidate() generates
a specification exception because it operates on an invalid address
(0xffffffffffffffef), which happens to be -EEXIST. So my assumption is that
create_dir_dentry() has incorrect error handling and passes -EEXIST instead
of a valid dentry pointer to d_invalidate().

But I leave it up to you to figure this out :)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ