[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241031114210.GA593548@coredump.intra.peff.net>
Date: Thu, 31 Oct 2024 07:42:10 -0400
From: Jeff King <peff@...f.net>
To: Rasmus Villemoes <ravi@...vas.dk>
Cc: Josh Poimboeuf <jpoimboe@...nel.org>,
Masahiro Yamada <masahiroy@...nel.org>,
linux-kernel@...r.kernel.org, linux-kbuild@...r.kernel.org,
git@...r.kernel.org
Subject: Re: [PATCH] setlocalversion: Add workaround for "git describe"
performance issue
On Thu, Oct 31, 2024 at 11:37:27AM +0100, Rasmus Villemoes wrote:
> and that "gave up" commit is v4.18-rc4, the eleventh commit
> encountered. That also explains why you have to add a "dummy" second
> --match to make --candidates=1 have the expected behaviour.
>
> Perhaps the logic should instead be that as soon as match_cnt hits
> max_candidates (i.e. all the tags we're going to consider have actually
> been visited), we break out. That is, the last "else" above should
> instead be replaced by
>
> if (match_cnt == max_candidates) {
> ... /* ? , gave_up_on is now a misnomer */
> break;
> }
Yes, I agree that is the right direction. Replacing the "else" entirely
feels a little weird, because it is part of the:
if (!tags && !all && n->prio < 2)
...
else if (match_cnt < max_candidates)
...
else
...
So we'd now run that check even if we triggered the first block. But I
don't think it should matter in practice. We only increment match_cnt in
the else-if here. So the "else" block could go away, and the check for
giving up could go inside the else-if.
It does seem like gave_up_on is now pointless, but I'm not sure I
understand all of the code here. I assumed that it was only used to
report "this is where we gave up", and to give you the extra bit of
information that there _were_ other candidates that we omitted (and not
just exactly max_candidates). Of course we don't show that without
--debug. So it seems silly to spend a bunch of extra CPU for that.
But the plot thickens.
What I was going to suggest is that if we wanted to retain that one bit
of information, what we could do instead is: independent of
max_candidates, see if we've found all of the possible names we expanded
from --match. Then max_candidates would work as it does now, but we'd
avoid fruitlessly searching when there are no more names to find.
Counting the number of expanded names is a little weird. We use them to
annotate the commits, but of course multiple names can point to a single
commit, and there's a priority override system. I think the final number
we can find is the number of entries in the "names" hash.
So I expected this to work:
diff --git a/builtin/describe.c b/builtin/describe.c
index 7330a77b38..70a11072de 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -380,6 +380,9 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
c->object.flags |= t->flag_within;
if (n->prio == 2)
annotated_cnt++;
+
+ if (match_cnt == hashmap_get_size(&names))
+ break;
}
else {
gave_up_on = c;
but it's still slow! If we set "gave_up_on = c", then it gets fast. I'm
not sure why that is. Later we do:
if (gave_up_on) {
commit_list_insert_by_date(gave_up_on, &list);
seen_commits--;
}
seen_commits += finish_depth_computation(&list, &all_matches[0]);
but I don't at all understand why adding gave_up_on lets that finish
sooner. So I'm worried we're missing something about how it is used.
One hack is to just, like the max_candidates case, let us look at one
_more_ commit before bailing. Like this:
diff --git a/builtin/describe.c b/builtin/describe.c
index 7330a77b38..177c8232f6 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -365,6 +365,11 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
struct commit_list *parents = c->parents;
struct commit_name **slot;
+ if (match_cnt == hashmap_get_size(&names)) {
+ gave_up_on = c;
+ break;
+ }
+
seen_commits++;
slot = commit_names_peek(&commit_names, c);
n = slot ? *slot : NULL;
That works, but I have a feeling that figured out what the heck is going
on with gave_up_on might produce a more elegant solution.
> Then as a further DWIM aid, wherever the initialization logic is could
> be updated so that, after expanding all the --match= wildcards, if the
> number of tags is less than max_candidates, automatically lower
> max_candidates to that number (which in the setlocalversion case will
> always be 1 because we're not actually passing a wildcard).
Yeah, I had the same thought (though if we do a separate hashmap check
as above, it wouldn't be needed).
-Peff
Powered by blists - more mailing lists