[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201211163017.3fjxnuickl2m523m@bsd-mbp.dhcp.thefacebook.com>
Date: Fri, 11 Dec 2020 08:30:17 -0800
From: Jonathan Lemon <jonathan.lemon@...il.com>
To: Yonghong Song <yhs@...com>
Cc: netdev@...r.kernel.org, ast@...nel.org, daniel@...earbox.net,
kernel-team@...com
Subject: Re: [PATCH v2 bpf-next] bpf: increment and use correct thread
iterator
On Wed, Dec 09, 2020 at 11:02:54AM -0800, Yonghong Song wrote:
>
>
> Maybe you can post v3 of the patch with the above information in the
> commit description so people can better understand what the problem
> you are trying to solve here?
>
> Also, could you also send to bpf@...r.kernel.org?
Sure, I can do that.
> > > > If unable to obtain the file structure for the current task,
> > > > proceed to the next task number after the one returned from
> > > > task_seq_get_next(), instead of the next task number from the
> > > > original iterator.
> > > This seems a correct change. The current code should still work
> > > but it may do some redundant/unnecessary work in kernel.
> > > This only happens when a task does not have any file,
> > > no sure whether this is the culprit for the problem this
> > > patch tries to address.
> > >
> > > >
> > > > Use thread_group_leader() instead of comparing tgid vs pid, which
> > > > might may be racy.
> > >
> > > I see
> > >
> > > static inline bool thread_group_leader(struct task_struct *p)
> > > {
> > > return p->exit_signal >= 0;
> > > }
> > >
> > > I am not sure whether thread_group_leader(task) is equivalent
> > > to task->tgid == task->pid or not. Any documentation or explanation?
> > >
> > > Could you explain why task->tgid != task->pid in the original
> > > code could be racy?
> >
> > My understanding is that anything which uses pid_t for comparision
> > in the kernel is incorrect. Looking at de_thread(), there is a
> > section which swaps the pid and tids around, but doesn't seem to
> > change tgid directly.
> >
> > There's also this comment in linux/pid.h:
> > /*
> > * Both old and new leaders may be attached to
> > * the same pid in the middle of de_thread().
> > */
> >
> > So the safest thing to do is use the explicit thread_group_leader()
> > macro rather than trying to open code things.
>
> I did some limited experiments and did not trigger a case where
> task->tgid != task->pid not agreeing with !thread_group_leader().
> Will need more tests in the environment to reproduce the warning
> to confirm whether this is the culprit or not.
Perhaps, but on the other hand, the splats disappear with this
patch, so it's doing something right. If your debug code hasn't
detected any cases where thread_group_leader() isn't making a
difference, then there shouldn't be any objections in making the
replacement, right? It does make the code easier to understand
and matches the rest of the kernel.
--
Jonathan
Powered by blists - more mailing lists