lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45x65lwhzefxfe7muha6myfqb53ooxvhjpgeqadeiikl5nriws@ekwlxybd6ybp>
Date: Fri, 15 Dec 2023 10:30:30 +0000
From: Alvin Šipraga <ALSI@...g-olufsen.dk>
To: Joe Perches <joe@...ches.com>
CC: Alvin Šipraga <alvin@...s.dk>, Linus Torvalds
	<torvalds@...ux-foundation.org>, Duje Mihanović
	<duje.mihanovic@...le.hr>, Konstantin Ryabitsev
	<konstantin@...uxfoundation.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, Shawn Guo <shawnguo@...nel.org>, Andrew
 Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v2] get_maintainer: correctly parse UTF-8 encoded names in
 files

On Thu, Dec 14, 2023 at 07:57:54AM -0800, Joe Perches wrote:
> On Thu, 2023-12-14 at 16:06 +0100, Alvin Šipraga wrote:
> > @@ -442,7 +443,7 @@ sub maintainers_in_file {
> >  	my $text = do { local($/) ; <$f> };
> >  	close($f);
> >  
> > -	my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> > +	my @poss_addr = $text =~ m$[\p{L}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> >  	push(@file_emails, clean_file_emails(@poss_addr));
> >      }
> >  }
> 
> Rather than open _all_ files in utf-8, perhaps the block
> that opens a specific file to find maintainers
> 
> sub maintainers_in_file {
>     my ($file) = @_;
> 
>     return if ($file =~ m@\bMAINTAINERS$@);
> 
>     if (-f $file && ($email_file_emails || $file =~ /\.yaml$/)) {
> 	open(my $f, '<', $file)
> 	    or die "$P: Can't open $file: $!\n";
> 	my $text = do { local($/) ; <$f> };
> 	close($f);
> 	...
> 
> should change the
> 
> 	open(my $f...
> to
> 	use open qw(:std :encoding(UTF-8));
> 	open(my $f...

Yes, this also works for parsing the name in an arbitrary file. But with the
change you suggest above, the script then corrupts my name when it is lifted
from MAINTAINERS (!?):

$ ./scripts/get_maintainer.pl -f drivers/net/dsa/realtek/ | grep alsi
"Alvin Å ipraga" <alsi@...g-olufsen.dk> (maintainer:REALTEK RTL83xx SMI DSA ROUTER CHIPS)

I'm not entirely sure why that happens, since my name doesn't get corrupted when
coming from MAINTAINERS with the upstream state of the script. But anyway, with
your suggestion I would then also have to add it here:

@@ -347,6 +346,7 @@ my @mfiles = ();
 my @self_test_info = ();
 
 sub read_maintainer_file {
+    use open qw(:std :encoding(UTF-8));
     my ($file) = @_;
 
     open (my $maint, '<', "$file")

... and I guess there might be other cases too.

Rather than scattering it all over, don't you think it's more robust to open all
files in UTF-8? I tried to show in one of my replies to v1 [1] that this should
be compatible with basically all of the source tree.

[1] https://lore.kernel.org/all/dzn6uco4c45oaa3ia4u37uo5mlt33obecv7gghj2l756fr4hdh@mt3cprft3tmq/

If you are still unconvinced then I will gladly send a v3 patching the two cases
we have discussed (read_maintainer_file() and maintainers_in_file()).

> 
> 
> And unrelated and secondarily, perhaps the
> 	$file =~ /\.yaml$/
> test should be
> 	$file =~ /\.(?:yaml|dtsi?)$/
> 
> to also find any maintainer address in the dts* files
> 
> https://lore.kernel.org/lkml/20231028174656.GA3310672@bill-the-cat/T/

Is this supposed to parse the "Copyright (c) 20xx John Doe <foo@....toto>" in
the .dts* files? But sure, I can do a resend of Shawn's original patch
separately if you like.

Kind regards,
Alvin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ