[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e6a9f8fc-e816-2b23-a4e5-74d5e5b86e6f@igalia.com>
Date: Wed, 9 Apr 2025 16:43:35 +0530
From: Bhupesh Sharma <bhsharma@...lia.com>
To: Yafang Shao <laoar.shao@...il.com>
Cc: Bhupesh <bhupesh@...lia.com>,
Linus Torvalds <torvalds@...ux-foundation.org>, akpm@...ux-foundation.org,
kernel-dev@...lia.com, linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
linux-perf-users@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-mm@...ck.org, oliver.sang@...el.com, lkp@...el.com, pmladek@...e.com,
rostedt@...dmis.org, mathieu.desnoyers@...icios.com, arnaldo.melo@...il.com,
alexei.starovoitov@...il.com, andrii.nakryiko@...il.com,
mirq-linux@...e.qmqm.pl, peterz@...radead.org, willy@...radead.org,
david@...hat.com, viro@...iv.linux.org.uk, keescook@...omium.org,
ebiederm@...ssion.com, brauner@...nel.org, jack@...e.cz, mingo@...hat.com,
juri.lelli@...hat.com, bsegall@...gle.com, mgorman@...e.de,
vschneid@...hat.com
Subject: Re: [PATCH v2 1/3] exec: Dynamically allocate memory to store task's
full name
Sorry for the delay in reply, I was out for a couple of days.
On 4/6/25 7:58 AM, Yafang Shao wrote:
> On Fri, Apr 4, 2025 at 2:35 PM Bhupesh Sharma <bhsharma@...lia.com> wrote:
>>
>> On 4/1/25 7:37 AM, Yafang Shao wrote:
>>> On Mon, Mar 31, 2025 at 8:18 PM Bhupesh <bhupesh@...lia.com> wrote:
>>>> Provide a parallel implementation for get_task_comm() called
>>>> get_task_full_name() which allows the dynamically allocated
>>>> and filled-in task's full name to be passed to interested
>>>> users such as 'gdb'.
>>>>
>>>> Currently while running 'gdb', the 'task->comm' value of a long
>>>> task name is truncated due to the limitation of TASK_COMM_LEN.
>>>>
>>>> For example using gdb to debug a simple app currently which generate
>>>> threads with long task names:
>>>> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
>>>> # cat log
>>>>
>>>> NameThatIsTooLo
>>>>
>>>> This patch does not touch 'TASK_COMM_LEN' at all, i.e.
>>>> 'TASK_COMM_LEN' and the 16-byte design remains untouched. Which means
>>>> that all the legacy / existing ABI, continue to work as before using
>>>> '/proc/$pid/task/$tid/comm'.
>>>>
>>>> This patch only adds a parallel, dynamically-allocated
>>>> 'task->full_name' which can be used by interested users
>>>> via '/proc/$pid/task/$tid/full_name'.
>>>>
>>>> After this change, gdb is able to show full name of the task:
>>>> # gdb ./threadnames -ex "run info thread" -ex "detach" -ex "quit" > log
>>>> # cat log
>>>>
>>>> NameThatIsTooLongForComm[4662]
>>>>
>>>> Signed-off-by: Bhupesh <bhupesh@...lia.com>
>>>> ---
>>>> fs/exec.c | 21 ++++++++++++++++++---
>>>> include/linux/sched.h | 9 +++++++++
>>>> 2 files changed, 27 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/fs/exec.c b/fs/exec.c
>>>> index f45859ad13ac..4219d77a519c 100644
>>>> --- a/fs/exec.c
>>>> +++ b/fs/exec.c
>>>> @@ -1208,6 +1208,9 @@ int begin_new_exec(struct linux_binprm * bprm)
>>>> {
>>>> struct task_struct *me = current;
>>>> int retval;
>>>> + va_list args;
>>>> + char *name;
>>>> + const char *fmt;
>>>>
>>>> /* Once we are committed compute the creds */
>>>> retval = bprm_creds_from_file(bprm);
>>>> @@ -1348,11 +1351,22 @@ int begin_new_exec(struct linux_binprm * bprm)
>>>> * detecting a concurrent rename and just want a terminated name.
>>>> */
>>>> rcu_read_lock();
>>>> - __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
>>>> - true);
>>>> + fmt = smp_load_acquire(&bprm->file->f_path.dentry->d_name.name);
>>>> + name = kvasprintf(GFP_KERNEL, fmt, args);
>>>> + if (!name)
>>>> + return -ENOMEM;
>>>> +
>>>> + me->full_name = name;
>>>> + __set_task_comm(me, fmt, true);
>>>> rcu_read_unlock();
>>>> } else {
>>>> - __set_task_comm(me, kbasename(bprm->filename), true);
>>>> + fmt = kbasename(bprm->filename);
>>>> + name = kvasprintf(GFP_KERNEL, fmt, args);
>>>> + if (!name)
>>>> + return -ENOMEM;
>>>> +
>>>> + me->full_name = name;
>>>> + __set_task_comm(me, fmt, true);
>>>> }
>>>>
>>>> /* An exec changes our domain. We are no longer part of the thread
>>>> @@ -1399,6 +1413,7 @@ int begin_new_exec(struct linux_binprm * bprm)
>>>> return 0;
>>>>
>>>> out_unlock:
>>>> + kfree(me->full_name);
>>>> up_write(&me->signal->exec_update_lock);
>>>> if (!bprm->cred)
>>>> mutex_unlock(&me->signal->cred_guard_mutex);
>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>> index 56ddeb37b5cd..053b52606652 100644
>>>> --- a/include/linux/sched.h
>>>> +++ b/include/linux/sched.h
>>>> @@ -1166,6 +1166,9 @@ struct task_struct {
>>>> */
>>>> char comm[TASK_COMM_LEN];
>>>>
>>>> + /* To store the full name if task comm is truncated. */
>>>> + char *full_name;
>>>> +
>>> Adding another field to store the task name isn’t ideal. What about
>>> combining them into a single field, as Linus suggested [0]?
>>>
>>> [0]. https://lore.kernel.org/all/CAHk-=wjAmmHUg6vho1KjzQi2=psR30+CogFd4aXrThr2gsiS4g@mail.gmail.com/
>>>
>> Thanks for sharing Linus's suggestion. I went through the suggested
>> changes in the related threads and came up with the following set of points:
>>
>> 1. struct task_struct would contain both 'comm' and 'full_name',
> Correct.
>
>> 2. Remove the task_lock() inside __get_task_comm(),
> This has been implemented in the patch series titled "Improve the copy
> of task comm". For details, please refer to:
> https://lore.kernel.org/linux-mm/20240828030321.20688-1-laoar.shao@gmail.com/.
>
>> 3. Users of task->comm will be affected in the following ways:
> Correct.
>
>> (a). Printing with '%s' and tsk->comm would just continue to
>> work,but will get a longer max string.
>> (b). For users of memcpy.*->comm\>', we should change 'memcpy()' to
>> 'copy_comm()' which would look like:
>>
>> memcpy(dst, src, TASK_COMM_LEN);
>> dst[TASK_COMM_LEN-1] = 0;
>>
>> (c). Users which use "sizeof(->comm)" will continue to get the old value because of the hacky union.
> Using a separate pointer rather than a union could simplify the
> implementation. I’m open to introducing a new pointer if you believe
> it’s the better approach.
Right, that's what I was thinking of earlier as well, i.e. having a new
pointer like tsk->full_name, however
allocating it outside the exec() hot-path may be tricky.
Let me try that though and come up with a v3, that addresses (a), (b) as
mentioned above and (c) with a pointer instead of union.
Thanks,
Bhupesh
Powered by blists - more mailing lists