[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <878sihgfzh.fsf@x220.int.ebiederm.org>
Date: Mon, 27 Apr 2020 09:28:34 -0500
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Linux FS Devel <linux-fsdevel@...r.kernel.org>,
Alexey Dobriyan <adobriyan@...il.com>,
Alexey Gladkov <legion@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Alexey Gladkov <gladkov.alexey@...il.com>,
Oleg Nesterov <oleg@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
"Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH v3 3/6] rculist: Add hlist_swap_before_rcu
Linus Torvalds <torvalds@...ux-foundation.org> writes:
> On Sun, Apr 26, 2020 at 7:14 AM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>>
>> To support this add hlist_swap_before_rcu. An hlist primitive that
>> will allow swaping the leading sections of two tasks. For exchanging
>> the task pids it will just be swapping the hlist_heads of two single
>> entry lists. But the functionality is more general.
>
> So I have no problems with the series now - the code is much more
> understandable. Partly because of the split-up, partly because of the
> comments, and partly because you explained the special case and why it
> was a valid thing to do...
>
> However, I did start thinking about this case again.
>
> I still don't think the "swap entry" macro is necessarily useful in
> _general_ - any time it's an actual individual entry, that swap macro
> doesn't really work.
But it isn't a "swap entry" macro/function. I did not even attempt
to make it a "swap entry" function.
I made a chop two lists into two and swap the pieces function.
> So the only reason it works here is because you're actually swapping
> the whole list.
>
> But that, in turn, shouldn't be using that "first node" model at all,
> it should use the hlist_head. That would have made it a lot more
> obvious what is actually going on to me.
>
> Now, the comment very much talks about the head case, but the code
> still looks like it's swapping a non-head thing.
>
> I guess the code technically _works_ with "swap two list ends", but
> does that actually really make sense as an operation?
As an operation yes. Will anyone else want that operation I don't know.
> So I no longer hate how this patch looks, but I wonder if we should
> just make the whole "this node is the *first* node" a bit more
> explicit in both the caller and in the swapping code.
>
> It could be as simple as replacing just the conceptual types and
> names, so instead of some "pnode1" double-indirect node pointer, we'd
> have
>
> struct hlist_head *left_head = container_of(left->pprev,
> struct hlist_head, first);
> struct hlist_head *right_head = container_of(right->pprev,
> struct hlist_head, first);
>
> and then the code would do
>
> rcu_assign_pointer(right_head->first, left);
> rcu_assign_pointer(left_head->first, right);
> WRITE_ONCE(left->pprev, &right_head->first);
> WRITE_ONCE(right->pprev, &left_head->first);
>
> which should generate the exact same code, but makes it clear that
> what we're doing is switching the whole hlist when given the first
> entries.
>
> Doesn't that make what it actually does a lot more understandable?
Understandable is a bit subjective. I think having a well defined hlist
operation I can call makes things more understandable.
I think the getting the list head as:
"head = &task->thread_pid->tasks[PIDTYPE_PID];" is more understandable
and less risky than container_of.
My concern and probably unreasonbable as this is a slow path
with getting the list heads after looking up the pid is that it seems
to add a wait for an additional cache line to load before anything can
happen.
The only way I really know to make this code much more understandable is
to remove the lists entirely for this case. But that is a much larger
change and it is not clear that it makes the kernel code overall better.
I stared at that for a while and it is an interesting follow on but not
something I want or we even can do before exchange_tids is in place.
> The
> *pnode1/pnode2 games are somewhat opaque, but with that type and name
> change and using "container_of()", the code now fairly naturally reads
> as "oh, we're changing the first pointers in the list heads, and
> making the nodes point back to them" .
>
> Again - the current function _works_ with swapping two hlists in the
> middle (not two entries - it swaps the whole list starting at that
> entry!), so your current patch is in some ways "more generic". I'm
> just suggesting that the generic case doesn't make much sense, and
> that the "we know the first entries, swap the lists" actually is what
> the real use is, and writing it as such makes the code easier to
> understand.
Yep. That is waht I designed it to do. I sort of went the other
direction when writing this. I could start with the list heads and swap
the rest of the lists and get the same code. But it looked like it
would be a little slower to find the hlist_heads, and I couldn't think
of a good name for the function. So I figured if I was writing a
fucntion for this case I would write one that was convinient.
For understandability that is my real challenge what is a good name
that people can read and understand what is happening for this swapping
function.
> But I'm not going to insist on this, so this is more an RFC. Maybe
> people disagree, and/or have an actual use case for that "break two
> hlists in the middle, swap the ends" that I find unlikely...
>
> (NOTE: My "convert to hlist_head" code _works_ for that case too
> because the code generation is the same! But it would be really really
> confusing for that code to be used for anything but the first entry).
Yes.
I am open to improvements. Especially in the naming.
Would hlists_swap_heads_rcu be noticably better?
Eric
Powered by blists - more mailing lists