Thursday, January 3, 2019

What is a non-capturing group? What does (?:) do?

Consider the following text:
http://stackoverflow.com/
https://stackoverflow.com/questions/tagged/regex
Now, if I apply the regex below over it...
(https?|ftp)://([^/\r\n]+)(/[^\r\n]*)?
... I would get the following result:
Match "http://stackoverflow.com/"
     Group 1: "http"
     Group 2: "stackoverflow.com"
     Group 3: "/"

Match "https://stackoverflow.com/questions/tagged/regex"
     Group 1: "https"
     Group 2: "stackoverflow.com"
     Group 3: "/questions/tagged/regex"
But I don't care about the protocol -- I just want the host and path of the URL. So, I change the regex to include the non-capturing group (?:).
(?:https?|ftp)://([^/\r\n]+)(/[^\r\n]*)?
Now, my result looks like this:
Match "http://stackoverflow.com/"
     Group 1: "stackoverflow.com"
     Group 2: "/"

Match "https://stackoverflow.com/questions/tagged/regex"
     Group 1: "stackoverflow.com"
     Group 2: "/questions/tagged/regex"
See? The first group has not been captured. The parser uses it to match the text, but ignores it later, in the final result.

No comments:

Post a Comment