You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@vsoch Once again, here is another attempt at improving our long and forgiving nice regex π
A little background, the current regex is something I found online and after testing it along with other links I deemed it to be good enough. However, I was never comfortable with how long it was.
Complexity, simplicity and regex visualizations
Here is a simplified (domain extensions are replaced with ... except .com and .org) graph of what we have at the moment:
So after hacking and tweaking for a couple of days, I think I came up with an improved regex, that is shorter which means faster and simpler. Here how it looks:
Comparing efficiency and speed
Here is a small idea on how it performs: https://regex101.com/r/zvnFp6/1
Unfortunately I couldn't run the same thing for our current regex cuz it is too long. However, I did run a the following comparison locally:
links.txt is a file with 755 urls, each on a seperate line. These urls are collected from the logs of buildtest and us-rse. The results of the previous comparison are the following:
As you can see the long beautifully formatted regex takes a lot of time and is worse than the others. The newest regex is the fastest and it returns urls that for sure has http or https in them.
So what's next?
I suggest you take a look at all this, and maybe test the regex too with different urls and different ideas to check its robustness and if your results are positive too then I can submit a PR π This blog post: In search of the perfect URL validation regex is a good inspiration. I think we rank somewhat third according to their test.
The text was updated successfully, but these errors were encountered:
@vsoch Once again, here is another attempt at improving our long and forgiving nice regex π
A little background, the current regex is something I found online and after testing it along with other links I deemed it to be good enough. However, I was never comfortable with how long it was.
Complexity, simplicity and regex visualizations
Here is a simplified (domain extensions are replaced with ... except .com and .org) graph of what we have at the moment:
So after hacking and tweaking for a couple of days, I think I came up with an improved regex, that is shorter which means faster and simpler. Here how it looks:
Comparing efficiency and speed
Here is a small idea on how it performs: https://regex101.com/r/zvnFp6/1
Unfortunately I couldn't run the same thing for our current regex cuz it is too long. However, I did run a the following comparison locally:
links.txt
is a file with 755 urls, each on a seperate line. These urls are collected from the logs of buildtest and us-rse. The results of the previous comparison are the following:As you can see the long beautifully formatted regex takes a lot of time and is worse than the others. The newest regex is the fastest and it returns urls that for sure has http or https in them.
So what's next?
I suggest you take a look at all this, and maybe test the regex too with different urls and different ideas to check its robustness and if your results are positive too then I can submit a PR π This blog post: In search of the perfect URL validation regex is a good inspiration. I think we rank somewhat third according to their test.
The text was updated successfully, but these errors were encountered: