| > I want to count the number of such encoded url,and
> classify it to malicious and normal url by its
> content. Then i'd see if url encoding especially
> host encoding could be used for url classification.
RFC2616:
Characters other than those in the "reserved" and "unsafe" sets (see
RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.
For example, the following three URIs are equivalent:
<http://abc.com:80/~smith/home.html>
<http://ABC.com/%7Esmith/home.html>
<http://ABC.com:/%7esmith/home.html>
By definition: not malicious and normal. | |