[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+wEVJaQcHdpVc3Za8qy0+Z-CGNeaDTrXtjJg2j7J6qsW4uAkQ@mail.gmail.com>
Date: Fri, 20 Jun 2025 15:26:27 +0200
From: Gabriele Paoloni <gpaoloni@...hat.com>
To: Masami Hiramatsu <mhiramat@...nel.org>
Cc: rostedt@...dmis.org, mathieu.desnoyers@...icios.com,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
acarmina@...hat.com, chuck.wolber@...ing.com
Subject: Re: [RFC PATCH 1/2] tracing: fixes of ftrace_enable_fops
On Fri, Jun 20, 2025 at 11:35 AM Masami Hiramatsu <mhiramat@...nel.org> wrote:
>
> On Thu, 19 Jun 2025 19:07:33 +0200
> Gabriele Paoloni <gpaoloni@...hat.com> wrote:
>
> > Hi Masami
> >
> > On Fri, Jun 13, 2025 at 4:45 AM Masami Hiramatsu <mhiramat@...nel.org> wrote:
> > >
> > > On Thu, 12 Jun 2025 12:43:48 +0200
> > > Gabriele Paoloni <gpaoloni@...hat.com> wrote:
> > >
> > > > Currently there are different issues associated with ftrace_enable_fops
> > > > - event_enable_write: *ppos is increased while not used at all in the
> > > > write operation itself (following a write, this could lead a read to
> > > > fail or report a corrupted event status);
> > >
> > > Here, we expected the "enable" file is a pseudo text file. So if
> > > there is a write, the ppos should be incremented.
> > >
> > > > - event_enable_read: cnt < strlen(buf) is allowed and this can lead to
> > > > reading an incomplete event status (i.e. not all status characters
> > > > are retrieved) and/or reading the status in a non-atomic way (i.e.
> > > > the status could change between two consecutive reads);
> > >
> > > As I said, the "enable" file is a kind of text file. So reader must read
> > > it until EOF. If you need to get the consistent result, user should
> > > use the enough size of buffer.
> >
> > What I am concerned about are scenarios like the one below:
> > ---
> > # strace -Tfe trace=openat,open,read,write scat 1
> > /sys/kernel/tracing/events/kprobes/ev1/enable
> > open("/sys/kernel/tracing/events/kprobes/ev1/enable",
> > O_RDONLY|O_LARGEFILE) = 3 <0.000237>
> > Open /sys/kernel/tracing/events/kprobes/ev1/enable ->fd=3
> > read fd=3, 1
> > read(3, "0", 1) = 1 <0.000099>
> > 1 bytes Read
> > 30,
> > read(3, "\n", 1) = 1 <0.000095>
> > 1 bytes Read
> > 0a,
> > read(3, "", 1) = 0 <0.000102>
> > close fd=3
> > +++ exited with 0 +++
> > ---
> > So in this case there are 2 consecutive reads byte by byte that
> > could lead to inconsistent results if in the meantime the event
> > status has changed.
>
> Unless you take a lock explicitly, ftrace (and other pseudo
> files) does not guarantee the consistency between 2 read()
> syscalls, because it is something like a file which is shared
> with kernel side.
>
> Please imagine that this is something like a file shared
> between two processes, one updating it and one reading it.
> The kernel guarantees the one read() will consistent, but
> two read()s may not be consistent because it can be updated
> by another.
Yes, this is the reason behind the proposal I made here.
In this case the Kernel rejects a read that is requesting
a number of bytes cnt that is smaller than strlen(buf).
>
> > With the proposed patchset the same test would result in a failure
> > as per log below:
> > ---
> > # strace -Tfe trace=openat,open,read,write scat 1
> > /sys/kernel/tracing/events/kprobes/ev1/enable
> > open("/sys/kernel/tracing/events/kprobes/ev1/enable",
> > O_RDONLY|O_LARGEFILE) = 3 <0.000227>
> > Open /sys/kernel/tracing/events/kprobes/ev1/enable ->fd=3
> > read fd=3, 1
> > read(3, 0x7ffd960234e0, 1) = -1 EINVAL (Invalid argument)
> > <0.000228>
> > close fd=3
> > +++ exited with 0 +++
> > ---
> > On the other side the proposed patchset would be still compatible with
> > “cat” or “scat 2” commands that continue to work as they do today.
> >
> > >
> > > > - .llseek is set to default_llseek: this is wrong since for this
> > > > type of files it does not make sense to reposition the ppos offset.
> > > > Hence this should be set instead to noop_llseek.
> > >
> > > As I said, it is a kind of text file, default_llseek is better.
> > >
> > > But, if we change (re-design) what is this "enable" file is,
> > > we can accept these changes. So this is not a "Fix" but re-design
> > > of the "enable" file as an interface (as a char device), not a text
> > > file (or a block device).
> > >
> > > I want to keep this as is, same as other tracefs files.
> >
> > IMO it is a redesign that is enforcing the user to avoid erroneous
> > usages of enable files (because the reads are either reporting the
> > whole and correct status of the event or failing to read; also the user
> > cannot llseek into a position that would lead to not reading or reading
> > a corrupted status).
>
> Can you make it for files which can be bigger than PAGE_SIZE?
>
> For example, "hist" file also can be inconsistent if user reads
> it several times. Can you also update it to return -EINVAL
> if read buffer size is smaller?
>From a very quick look it seems that the read callback of event_hist_fops
is set to the standard seq_read, whereas the read callback in
ftrace_enable_fops defines its own method.
So maybe redesigning the read callback of event_hist_fops could
achieve this goal (in order to be sure I need to look deeper into it,
this is just a guess).
>
> >
> > On the other hand the proposed re-design is fully compatible with
> > the current user space commands reading and writing to the enable
> > files.
> >
> > If the concern is having inconsistent implementations between tracefs
> > files, I am happy to extend this patchset to all traces files, however,
> > before doing so, I would like to have your approval.
>
>
> Hmm, I'm still not convinced. If you redesign it, that should also
> be applied to other pseudo files. "why tracefs can not read partially,
> but procfs can?" I guess that can cause more confusion and will
> lead unneeded debug.
>
> > Otherwise I will just document the current functions and associated
> > assumptions of use that the user must comply with in order to avoid
> > the erroneous behaviour.
>
> Yeah, I like to update the document so that user must read with enough
> size of buffer. And TIPs how to read consistent data from each file.
I think that from my side I do not have comprehensive answers to all your
questions (I need to read the code more in depth).
So to be pragmatic I can split this effort into two parts (documentation and
redesign); I will propose documentation first with the TIPs that you mentioned
above and later, if we find a better re-design solution, we can also amend
the documentation as needed.
Many thanks for your advice and directions
Gab
>
> Thank you,
>
> >
> > Thanks a lot for your inputs and clarifications.
> > Gab
> > >
> > > Thank you,
> > >
> > > >
> > > > This patch fixes all the issues listed above.
> > > >
> > > > Signed-off-by: Gabriele Paoloni <gpaoloni@...hat.com>
> > > > Tested-by: Alessandro Carminati <acarmina@...hat.com>
> > > > ---
> > > > kernel/trace/trace_events.c | 11 ++++++++---
> > > > 1 file changed, 8 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> > > > index 120531268abf..5e84ef01d0c8 100644
> > > > --- a/kernel/trace/trace_events.c
> > > > +++ b/kernel/trace/trace_events.c
> > > > @@ -1798,6 +1798,13 @@ event_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
> > > >
> > > > strcat(buf, "\n");
> > > >
> > > > + /*
> > > > + * A requested cnt less than strlen(buf) could lead to a wrong
> > > > + * event status being reported.
> > > > + */
> > > > + if (cnt < strlen(buf))
> > > > + return -EINVAL;
> > > > +
> > > > return simple_read_from_buffer(ubuf, cnt, ppos, buf, strlen(buf));
> > > > }
> > > >
> > > > @@ -1833,8 +1840,6 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
> > > > return -EINVAL;
> > > > }
> > > >
> > > > - *ppos += cnt;
> > > > -
> > > > return cnt;
> > > > }
> > > >
> > > > @@ -2557,7 +2562,7 @@ static const struct file_operations ftrace_enable_fops = {
> > > > .read = event_enable_read,
> > > > .write = event_enable_write,
> > > > .release = tracing_release_file_tr,
> > > > - .llseek = default_llseek,
> > > > + .llseek = noop_llseek,
> > > > };
> > > >
> > > > static const struct file_operations ftrace_event_format_fops = {
> > > > --
> > > > 2.48.1
> > > >
> > >
> > >
> > > --
> > > Masami Hiramatsu (Google) <mhiramat@...nel.org>
> > >
> >
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@...nel.org>
>
Powered by blists - more mailing lists