Consider the following text:
http://stackoverflow.com/
https://stackoverflow.com/questions/tagged/regex
Now, if I apply the regex below over it...
(https?|ftp)://([^/\r\n]+)(/[^\r\n]*)?
... I would get the following result:
Match "http://stackoverflow.com/"
Group 1: "http"
Group 2: "stackoverflow.com"
Group 3: "/"
Match "https://stackoverflow.com/questions/tagged/regex"
Group 1: "https"
Group 2: "stackoverflow.com"
Group 3: "/questions/tagged/regex"
But I don't care about the protocol -- I just want the host and path of the URL. So, I change the regex to include the non-capturing group
(?:).(?:https?|ftp)://([^/\r\n]+)(/[^\r\n]*)?
Now, my result looks like this:
Match "http://stackoverflow.com/"
Group 1: "stackoverflow.com"
Group 2: "/"
Match "https://stackoverflow.com/questions/tagged/regex"
Group 1: "stackoverflow.com"
Group 2: "/questions/tagged/regex"
See? The first group has not been captured. The parser uses it to match the text, but ignores it later, in the final result.
No comments:
Post a Comment