[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <92afdf33e22e8a63f6baaaba94c004cf2ec5a7d7.camel@perches.com>
Date: Sun, 18 Sep 2022 10:03:17 -0700
From: Joe Perches <joe@...ches.com>
To: Janne Grunau <j@...nau.net>
Cc: linux-kernel@...r.kernel.org,
Martin Povišer <povik+lin@...ebit.org>
Subject: Re: [PATCH] get_maintainer: Extend matched name characters in
maintainers_in_file()
On Sat, 2022-09-17 at 07:11 -0700, Joe Perches wrote:
> On Fri, 2022-09-16 at 10:47 +0200, Janne Grunau wrote:
> > Extend the regexp matching name characters to cover Unicode blocks Latin
> > Extended-A and Extended-B.
> > Fixes 'scripts/get_maintainer.pl -f' for
> > 'Documentation/devicetree/bindings/clock/apple,nco.yaml'.
> >
> > Signed-off-by: Janne Grunau <j@...nau.net>
> >
> > ---
> > This still excludes Greek and Cyrilic characters which should be
> > expected in names as well. I tried to use '\p{L}' to match all Unicode
> > letters but couldn't get it to work. Feel free understand this as bug
> > report with an incomplete fix.
>
> Maybe use \p{XPosixAlpha} ?
>
> but I don't know what version of perl introduced this.
>
> > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
> []
> > @@ -442,7 +442,7 @@ sub maintainers_in_file {
> > my $text = do { local($/) ; <$f> };
> > close($f);
> >
> > - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> > + my @poss_addr = $text =~ m$[A-Za-zÀ-ɏ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
>
> my @poss_addr = $text =~ m$[\p{XPosixAlpha}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
Using variations of \p{posix} doesn't seem to work for at least perl 5.34.
\p{print} seems to work for Documentation/devicetree/bindings/clock/apple,nco.yaml,
but I don't know how fragile it is.
\p{print} might be too greedy...
---
scripts/get_maintainer.pl | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
index ab123b498fd9..790112c3e1d7 100755
--- a/scripts/get_maintainer.pl
+++ b/scripts/get_maintainer.pl
@@ -442,7 +442,7 @@ sub maintainers_in_file {
my $text = do { local($/) ; <$f> };
close($f);
- my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
+ my @poss_addr = $text =~ m$[\p{print}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
push(@file_emails, clean_file_emails(@poss_addr));
}
}
@@ -2456,11 +2456,12 @@ sub clean_file_emails {
foreach my $email (@file_emails) {
$email =~ s/[\(\<\{]{0,1}([A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+)[\)\>\}]{0,1}/\<$1\>/g;
my ($name, $address) = parse_email($email);
+ $name =~ s/^\p{space}*\p{punct}*\p{space}*//;
if ($name eq '"[,\.]"') {
$name = "";
}
- my @nw = split(/[^A-Za-zÀ-ÿ\'\,\.\+-]/, $name);
+ my @nw = split(/[^\p{print}\'\,\.\+-]/, $name);
if (@nw > 2) {
my $first = $nw[@nw - 3];
my $middle = $nw[@nw - 2];
Powered by blists - more mailing lists