lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <663wjwkshj4fnjiqfvxkbuebhge27esjldm6loulogqls2pghu@szeax5kk4d74>
Date: Tue, 19 Dec 2023 01:25:42 +0000
From: Alvin Šipraga <ALSI@...g-olufsen.dk>
To: Joe Perches <joe@...ches.com>, Shawn Guo <shawnguo@...nel.org>
CC: Alvin Šipraga <alvin@...s.dk>, Linus Torvalds
	<torvalds@...ux-foundation.org>, Duje Mihanović
	<duje.mihanovic@...le.hr>, Konstantin Ryabitsev
	<konstantin@...uxfoundation.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v2] get_maintainer: correctly parse UTF-8 encoded names in
 files

+Shawn, please have a look at the bottom of this mail about your patch
we would like to resend.

On Fri, Dec 15, 2023 at 10:30:52AM -0800, Joe Perches wrote:
> On Fri, 2023-12-15 at 10:30 +0000, Alvin Šipraga wrote:
> > On Thu, Dec 14, 2023 at 07:57:54AM -0800, Joe Perches wrote:
> > > On Thu, 2023-12-14 at 16:06 +0100, Alvin Šipraga wrote:
> > > > @@ -442,7 +443,7 @@ sub maintainers_in_file {
> > > >  	my $text = do { local($/) ; <$f> };
> > > >  	close($f);
> > > >  
> > > > -	my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> > > > +	my @poss_addr = $text =~ m$[\p{L}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> > > >  	push(@file_emails, clean_file_emails(@poss_addr));
> 
> Hi again Alvin.
> 
> Separate issue, but on the one .yaml file I tried:
> 
> $ ./scripts/get_maintainer.pl Documentation/devicetree/bindings/serial/8250.yaml
> Greg Kroah-Hartman <gregkh@...uxfoundation.org> (supporter:TTY LAYER AND SERIAL DRIVERS)
> Jiri Slaby <jirislaby@...nel.org> (supporter:TTY LAYER AND SERIAL DRIVERS)
> Rob Herring <robh+dt@...nel.org> (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS)
> Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org> (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS)
> Conor Dooley <conor+dt@...nel.org> (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS)
> Lubomir Rintel <lkundrak@...sk> (in file)
> - <devicetree@...r.kernel.org> (in file)
> linux-kernel@...r.kernel.org (open list:TTY LAYER AND SERIAL DRIVERS)
> linux-serial@...r.kernel.org (open list:TTY LAYER AND SERIAL DRIVERS)
> devicetree@...r.kernel.org (open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS)
> 
> Note the single '-' in the "name" portion of devicetree@...r.kernel.org
> 
> Maybe clean_file_emails needs some better name cleansing code.

OK, I had a look and made a patch to fix this as well. Please see v3
which is on its way to your inbox.

Regarding the .dts* patch, I need some feedback below before sending
anything, so I did not include it in the series.

> 
> > > Rather than open _all_ files in utf-8, perhaps the block
> > > that opens a specific file to find maintainers
> > > 
> > > sub maintainers_in_file {
> > >     my ($file) = @_;
> > > 
> > >     return if ($file =~ m@\bMAINTAINERS$@);
> > > 
> > >     if (-f $file && ($email_file_emails || $file =~ /\.yaml$/)) {
> > > 	open(my $f, '<', $file)
> > > 	    or die "$P: Can't open $file: $!\n";
> > > 	my $text = do { local($/) ; <$f> };
> > > 	close($f);
> > > 	...
> > > 
> > > should change the
> > > 
> > > 	open(my $f...
> > > to
> > > 	use open qw(:std :encoding(UTF-8));
> > > 	open(my $f...
> > 
> > Yes, this also works for parsing the name in an arbitrary file. But with the
> > change you suggest above, the script then corrupts my name when it is lifted
> > from MAINTAINERS (!?):
> > 
> > $ ./scripts/get_maintainer.pl -f drivers/net/dsa/realtek/ | grep alsi
> > "Alvin Å ipraga" <alsi@...g-olufsen.dk> (maintainer:REALTEK RTL83xx SMI DSA ROUTER CHIPS)
> 
> Curious.  Let me see if I can figure out why that happens.
> 
> 
> > If you are still unconvinced then I will gladly send a v3 patching the two cases
> > we have discussed (read_maintainer_file() and maintainers_in_file()).
> 
> No rush.
> 
> > > And unrelated and secondarily, perhaps the
> > > 	$file =~ /\.yaml$/
> > > test should be
> > > 	$file =~ /\.(?:yaml|dtsi?)$/
> > > 
> > > to also find any maintainer address in the dts* files
> > > 
> > > https://lore.kernel.org/lkml/20231028174656.GA3310672@bill-the-cat/T/
> > 
> > Is this supposed to parse the "Copyright (c) 20xx John Doe <foo@....toto>" in
> > the .dts* files?
> 
> Yes, just as it would and does for .yaml files.
> 
> $ git grep -P -i 'copy.*\<\w+\@\w+\.\w+\>' -- '*.yaml'
> Documentation/devicetree/bindings/display/bridge/chrontel,ch7033.yaml:# Copyright (C) 2019,2020 Lubomir Rintel <lkundrak@...sk>
> Documentation/devicetree/bindings/media/marvell,mmp2-ccic.yaml:# Copyright 2019,2020 Lubomir Rintel <lkundrak@...sk>
> Documentation/devicetree/bindings/misc/olpc,xo1.75-ec.yaml:# Copyright (C) 2019,2020 Lubomir Rintel <lkundrak@...sk>
> Documentation/devicetree/bindings/phy/allwinner,sun50i-h6-usb3-phy.yaml:# Copyright 2019 Ondrej Jirman <megous@...ous.com>
> Documentation/devicetree/bindings/phy/marvell,mmp3-hsic-phy.yaml:# Copyright 2019 Lubomir Rintel <lkundrak@...sk>
> Documentation/devicetree/bindings/phy/marvell,mmp3-usb-phy.yaml:# Copyright 2019,2020 Lubomir Rintel <lkundrak@...sk>
> Documentation/devicetree/bindings/reset/bitmain,bm1880-reset.yaml:# Copyright 2019 Manivannan Sadhasivam <mani@...nel.org>
> Documentation/devicetree/bindings/reset/marvell,berlin2-reset.yaml:# Copyright 2015 Antoine Tenart <atenart@...nel.org>
> Documentation/devicetree/bindings/reset/qca,ar7100-reset.yaml:# Copyright 2015 Alban Bedel <albeu@...e.fr>
> Documentation/devicetree/bindings/serial/8250.yaml:# Copyright 2020 Lubomir Rintel <lkundrak@...sk>
> Documentation/devicetree/bindings/spi/marvell,mmp2-ssp.yaml:# Copyright 2019,2020 Lubomir Rintel <lkundrak@...sk>
> Documentation/devicetree/bindings/usb/marvell,pxau2o-ehci.yaml:# Copyright 2019,2020 Lubomir Rintel <lkundrak@...sk>

Hmm, I ran this over the arch/ directory on all .dts* files and got some
false positives:

  1355276466-18295-1-git-send-email-arve@...roid.com

This one comes from a lore link with a Message-ID. So deleting URLs
before parsing should fix it.

  aaci_bitclk@...288M
  led@...0
  led@...1
  led@...2
  led@...3
  led@...4
  led@...5
  led@...6
  led@...7
  led@...0
  led@...1
  led@...2
  led@...3
  led@...4
  led@...5
  led@...6
  led@...7
  mxtal@...2M
  timclk@...M
  uartclk@...74M
  xtal24.576@...576M

These can also be avoided by strengthening the email parsing regex
to more strictly validate the TLD, which may not begin with a digit.

So something like this:

diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
index faa96801897a..e253053da967 100755
--- a/scripts/get_maintainer.pl
+++ b/scripts/get_maintainer.pl
@@ -446,7 +446,10 @@ sub maintainers_in_file {
        my $text = do { local($/) ; <$f> };
        close($f);
 
-       my @poss_addr = $text =~ m$[\p{L}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
+       # Avoid mistaking URLs with Message-IDs as emails
+       $text =~ s/https?:[^\s]+//g;
+
+       my @poss_addr = $text =~ m$[\p{L}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z][A-Za-z0-9]+[\)\>\}]{0,1}$g;
        push(@file_emails, clean_file_emails(@poss_addr));
     }
 }


> 
> > But sure, I can do a resend of Shawn's original patch
> > separately if you like.
> 
> Yes please. Make sure to cc Andrew Morton.

Given that some changes are needed, I want your input on how to send:

1. Should I send the above diff as a patch preceding Shawn's patch?

2. Should I send the above diff as a patch following Shawn's patch? Here
   the justification is arguably more clear in the history.

3. Or should I roll it into Shawn's patch? If so, Shawn, may I have your
   Signed-off-by?

Kind regards,
Alvin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ