lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250620183503.6c84eb22cca206cd10418c04@kernel.org>
Date: Fri, 20 Jun 2025 18:35:03 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Gabriele Paoloni <gpaoloni@...hat.com>
Cc: rostedt@...dmis.org, mathieu.desnoyers@...icios.com,
 linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
 acarmina@...hat.com, chuck.wolber@...ing.com
Subject: Re: [RFC PATCH 1/2] tracing: fixes of ftrace_enable_fops

On Thu, 19 Jun 2025 19:07:33 +0200
Gabriele Paoloni <gpaoloni@...hat.com> wrote:

> Hi Masami
> 
> On Fri, Jun 13, 2025 at 4:45 AM Masami Hiramatsu <mhiramat@...nel.org> wrote:
> >
> > On Thu, 12 Jun 2025 12:43:48 +0200
> > Gabriele Paoloni <gpaoloni@...hat.com> wrote:
> >
> > > Currently there are different issues associated with ftrace_enable_fops
> > > - event_enable_write: *ppos is increased while not used at all in the
> > >   write operation itself (following a write, this could lead a read to
> > >   fail or report a corrupted event status);
> >
> > Here, we expected the "enable" file is a pseudo text file. So if
> > there is a write, the ppos should be incremented.
> >
> > > - event_enable_read: cnt < strlen(buf) is allowed and this can lead to
> > >   reading an incomplete event status (i.e. not all status characters
> > >   are retrieved) and/or reading the status in a non-atomic way (i.e.
> > >   the status could change between two consecutive reads);
> >
> > As I said, the "enable" file is a kind of text file. So reader must read
> > it until EOF. If you need to get the consistent result, user should
> > use the enough size of buffer.
> 
> What I am concerned about are scenarios like the one below:
> ---
> # strace -Tfe trace=openat,open,read,write scat 1
> /sys/kernel/tracing/events/kprobes/ev1/enable
> open("/sys/kernel/tracing/events/kprobes/ev1/enable",
> O_RDONLY|O_LARGEFILE) = 3 <0.000237>
> Open /sys/kernel/tracing/events/kprobes/ev1/enable ->fd=3
> read fd=3, 1
> read(3, "0", 1)                         = 1 <0.000099>
> 1 bytes Read
> 30,
> read(3, "\n", 1)                        = 1 <0.000095>
> 1 bytes Read
> 0a,
> read(3, "", 1)                          = 0 <0.000102>
> close fd=3
> +++ exited with 0 +++
> ---
> So in this case there are 2 consecutive reads byte by byte that
> could lead to inconsistent results if in the meantime the event
> status has changed.

Unless you take a lock explicitly, ftrace (and other pseudo
files) does not guarantee the consistency between 2 read()
syscalls, because it is something like a file which is shared
with kernel side.

Please imagine that this is something like a file shared
between two processes, one updating it and one reading it.
The kernel guarantees the one read() will consistent, but
two read()s may not be consistent because it can be updated
by another.

> With the proposed patchset the same test would result in a failure
> as per log below:
> ---
> # strace -Tfe trace=openat,open,read,write scat 1
> /sys/kernel/tracing/events/kprobes/ev1/enable
> open("/sys/kernel/tracing/events/kprobes/ev1/enable",
> O_RDONLY|O_LARGEFILE) = 3 <0.000227>
> Open /sys/kernel/tracing/events/kprobes/ev1/enable ->fd=3
> read fd=3, 1
> read(3, 0x7ffd960234e0, 1)              = -1 EINVAL (Invalid argument)
> <0.000228>
> close fd=3
> +++ exited with 0 +++
> ---
> On the other side the proposed patchset would be still compatible with
> “cat” or “scat 2” commands that continue to work as they do today.
> 
> >
> > > - .llseek is set to default_llseek: this is wrong since for this
> > >   type of files it does not make sense to reposition the ppos offset.
> > >   Hence this should be set instead to noop_llseek.
> >
> > As I said, it is a kind of text file, default_llseek is better.
> >
> > But, if we change (re-design) what is this "enable" file is,
> > we can accept these changes. So this is not a "Fix" but re-design
> > of the "enable" file as an interface (as a char device), not a text
> > file (or a block device).
> >
> > I want to keep this as is, same as other tracefs files.
> 
> IMO it is a redesign that is enforcing the user to avoid erroneous
> usages of enable files (because the reads are either reporting the
> whole and correct status of the event or failing to read; also the user
> cannot llseek into a position that would lead to not reading or reading
> a corrupted status).

Can you make it for files which can be bigger than PAGE_SIZE?

For example, "hist" file also can be inconsistent if user reads
it several times. Can you also update it to return -EINVAL
if read buffer size is smaller?

> 
> On the other hand the proposed re-design is fully compatible with
> the current user space commands reading and writing to the enable
> files.
> 
> If the concern is having inconsistent implementations between tracefs
> files, I am happy to extend this patchset to all traces files, however,
> before doing so, I would like to have your approval.


Hmm, I'm still not convinced. If you redesign it, that should also
be applied to other pseudo files. "why tracefs can not read partially,
but procfs can?" I guess that can cause more confusion and will
lead unneeded debug.

> Otherwise I will just document the current functions and associated
> assumptions of use that the user must comply with in order to avoid
> the erroneous behaviour.

Yeah, I like to update the document so that user must read with enough
size of buffer. And TIPs how to read consistent data from each file.

Thank you,

> 
> Thanks a lot for your inputs and clarifications.
> Gab
> >
> > Thank you,
> >
> > >
> > > This patch fixes all the issues listed above.
> > >
> > > Signed-off-by: Gabriele Paoloni <gpaoloni@...hat.com>
> > > Tested-by: Alessandro Carminati <acarmina@...hat.com>
> > > ---
> > >  kernel/trace/trace_events.c | 11 ++++++++---
> > >  1 file changed, 8 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> > > index 120531268abf..5e84ef01d0c8 100644
> > > --- a/kernel/trace/trace_events.c
> > > +++ b/kernel/trace/trace_events.c
> > > @@ -1798,6 +1798,13 @@ event_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
> > >
> > >       strcat(buf, "\n");
> > >
> > > +     /*
> > > +      * A requested cnt less than strlen(buf) could lead to a wrong
> > > +      * event status being reported.
> > > +      */
> > > +     if (cnt < strlen(buf))
> > > +             return -EINVAL;
> > > +
> > >       return simple_read_from_buffer(ubuf, cnt, ppos, buf, strlen(buf));
> > >  }
> > >
> > > @@ -1833,8 +1840,6 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
> > >               return -EINVAL;
> > >       }
> > >
> > > -     *ppos += cnt;
> > > -
> > >       return cnt;
> > >  }
> > >
> > > @@ -2557,7 +2562,7 @@ static const struct file_operations ftrace_enable_fops = {
> > >       .read = event_enable_read,
> > >       .write = event_enable_write,
> > >       .release = tracing_release_file_tr,
> > > -     .llseek = default_llseek,
> > > +     .llseek = noop_llseek,
> > >  };
> > >
> > >  static const struct file_operations ftrace_event_format_fops = {
> > > --
> > > 2.48.1
> > >
> >
> >
> > --
> > Masami Hiramatsu (Google) <mhiramat@...nel.org>
> >
> 


-- 
Masami Hiramatsu (Google) <mhiramat@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ