[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+aqT=e30Rvhz233NQBqXioNE11kPWbfcdnYQPk8QMkRBQ@mail.gmail.com>
Date:   Wed, 11 Jan 2023 16:49:47 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     tglx@...utronix.de
Cc:     linux-kernel@...r.kernel.org, Marco Elver <elver@...gle.com>
Subject: Re: [RFC PATCH] posix-timers: Support delivery of signals to the
 current thread
On Fri, 16 Dec 2022 at 18:18, Dmitry Vyukov <dvyukov@...gle.com> wrote:
>
> Support CLOCK_PROCESS_CPUTIME_ID timers with SIGEV_SIGNAL | SIGEV_THREAD_ID
> and sigev_notify_thread_id == 0, which sends the signal to the current
> thread that fires the timer. This is useful for relatively uniform sampling
> of process execution across all threads with signals.
>
> Signed-off-by: Dmitry Vyukov <dvyukov@...gle.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Marco Elver <elver@...gle.com>
>
> ---
>
> We are trying to implement sampling program analysis based on this.
> We don't need 100% uniform sampling as a CPU profiler would need,
> but we still need the signals to be reasonably distributed across
> the process threads.
>
> Thomas, does the idea look sane to you overall?
> Are there any existing alternatives to this?
Hi Thomas,
I guess this was lost in your inbox during the holidays period.
Please take a look. Should I mail this as a non-RFC patch?
Thanks
> All alternatives we found are complex and/or have high overhead.
> E.g. we considered using CLOCK_PROCESS_CPUTIME_ID+SIGEV_SIGNAL timer
> plus inherited for all threads perf_event(PERF_COUNT_SW_TASK_CLOCK).
> When the timer fires we enable the perf event, and then use the signals
> from the perf event, and then disable the perf event.
> But this has considerable memory overhead (perf event per thread)
> and issues IPIs to all CPUs for perf event enable/disable.
>
> We also considered using CLOCK_PROCESS_CPUTIME_ID+SIGEV_SIGNAL timer
> and then manually scan /proc/self/task/ and select some task at random.
> But this is also complex and makes it hard to do reasonable sampling
> proportional to activity of threads.
>
> All alternatives are based on CLOCK_PROCESS_CPUTIME_ID in some way,
> and it seems that just a single CLOCK_PROCESS_CPUTIME_ID timer is enough
> if it could deliver signals to active threads (what this patch is doing).
> The analysis we are trying to do is intended for production systems
> so we are aiming at as low overhead as possible.
>
> If this idea looks sane to you in general, I will add tests and I am open
> to suggestions on the user API (should it be a new SIGEV_CURRENT_THREAD?)
> and on how to represent this in the struct k_itimer. E.g. we could keep
> it_pid=current but add a special flag that says to send the signal to
> the current task rather than it_pid. This has an advantage that we can
> add the following check to posix_timer_event()
> (which would be pretty bad to violate):
> WARN_ON(!same_thread_group(pid_task(timr->it_pid, PIDTYPE_PID), current));
> ---
>  kernel/time/posix-timers.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> index 5dead89308b74..411ba087e0699 100644
> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -336,6 +336,7 @@ void posixtimer_rearm(struct kernel_siginfo *info)
>  int posix_timer_event(struct k_itimer *timr, int si_private)
>  {
>         enum pid_type type;
> +       struct pid *pid;
>         int ret;
>         /*
>          * FIXME: if ->sigq is queued we can race with
> @@ -350,8 +351,9 @@ int posix_timer_event(struct k_itimer *timr, int si_private)
>          */
>         timr->sigq->info.si_sys_private = si_private;
>
> +       pid = timr->it_pid ?: task_pid(current);
>         type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID;
> -       ret = send_sigqueue(timr->sigq, timr->it_pid, type);
> +       ret = send_sigqueue(timr->sigq, pid, type);
>         /* If we failed to send the signal the timer stops. */
>         return ret > 0;
>  }
> @@ -428,27 +430,31 @@ static enum hrtimer_restart posix_timer_fn(struct hrtimer *timer)
>         return ret;
>  }
>
> -static struct pid *good_sigevent(sigevent_t * event)
> +static struct pid *good_sigevent(sigevent_t *event, clockid_t which_clock)
>  {
>         struct pid *pid = task_tgid(current);
>         struct task_struct *rtn;
>
>         switch (event->sigev_notify) {
>         case SIGEV_SIGNAL | SIGEV_THREAD_ID:
> +               /* This will use the current task for signals. */
> +               if (which_clock == CLOCK_PROCESS_CPUTIME_ID &&
> +                   !event->sigev_notify_thread_id)
> +                       return NULL;
>                 pid = find_vpid(event->sigev_notify_thread_id);
>                 rtn = pid_task(pid, PIDTYPE_PID);
>                 if (!rtn || !same_thread_group(rtn, current))
> -                       return NULL;
> +                       return ERR_PTR(-ENOENT);
>                 fallthrough;
>         case SIGEV_SIGNAL:
>         case SIGEV_THREAD:
>                 if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
> -                       return NULL;
> +                       return ERR_PTR(-EINVAL);
>                 fallthrough;
>         case SIGEV_NONE:
>                 return pid;
>         default:
> -               return NULL;
> +               return ERR_PTR(-EINVAL);
>         }
>  }
>
> @@ -502,6 +508,7 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
>         struct k_itimer *new_timer;
>         int error, new_timer_id;
>         int it_id_set = IT_ID_NOT_SET;
> +       struct pid *pid;
>
>         if (!kc)
>                 return -EINVAL;
> @@ -527,9 +534,11 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
>
>         if (event) {
>                 rcu_read_lock();
> -               new_timer->it_pid = get_pid(good_sigevent(event));
> +               pid = good_sigevent(event, which_clock);
> +               if (!IS_ERR(pid))
> +                       new_timer->it_pid = get_pid(pid);
>                 rcu_read_unlock();
> -               if (!new_timer->it_pid) {
> +               if (IS_ERR(pid)) {
>                         error = -EINVAL;
>                         goto out;
>                 }
>
> base-commit: 041fae9c105ae342a4245cf1e0dc56a23fbb9d3c
> --
> 2.39.0.314.g84b9a713c41-goog
>
Powered by blists - more mailing lists
 
