Class UriCodec


  • public abstract class UriCodec
    extends java.lang.Object
    Helper class to encode/decode (pieces of) URI strings

    Static methods are provided to encode and decode parts of a URI. Works only with UTF-8 and US-ASCII character encodings. The results are undefined for all other encodings.

    Encoding a string involves encoding each character not allowed in that part of the URI string. Non-allowed characters are converted to a sequence of ‘percent-encoded’ bytes (octets). A percent-encoded octet is ‘%xx’ where xx are two hexadecimal digits. The octets percent-encoded for a character depend upon the character encoding used: in the case of UTF-8 this might be up to four octets, but for US-ASCII encoding this would be only one octet. Decoding reconstructs the characters from the percent-encoded octets.

    The character encoding used to decode a string must be that used encode it, or the results are undefined.

    Coding and decoding a URI depends which part of the URI you are accessing. Notice, for example, that in the userinfo part, colons are allowed asis, but that in the first relative path segment they are not.

    This class only provides string-to-string encoding and decoding methods for (some) parts of a URI string. It does not syntactically validate the encoded or decoded strings for each part. So, for example, it will encode a scheme part which contains colons (replacing them with percent-encoded octets), even though percent-encoded octets are not allowed in schemes.

    [NB: The algorithmic assumption is made throughout that all valid characters are represented by a single byte in the US-ASCII codepage and that these coincide with their representation in UTF-8.]

    URI Specification

    RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax is obsoleted by RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax which describes in (A)BNF the valid form of an encoded URI.

    Here is the appendix from RFC 3986 which gives the ABNF. ABNF itself is defined in RFC 2234 - Augmented BNF for Syntax Specifications: ABNF where furthermore the following non-terminals are defined: ALPHA (letters), DIGIT (decimal digits), and HEXDIG (hexadecimal digits).

    [NB: ‘;’ indicates a line comment, as in the original, but I have replaced the solidus ‘/’ (alternative) operator by the graphemically more pleasing vertical bar ‘|’.]

    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.lang.String decHost​(java.lang.String rawui, java.lang.String encodingScheme)  
      static java.lang.String decPath​(java.lang.String rawui, java.lang.String encodingScheme)  
      static java.lang.String decScheme​(java.lang.String rawui, java.lang.String encodingScheme)  
      static java.lang.String decSegment​(java.lang.String rawui, java.lang.String encodingScheme)  
      static java.lang.String decUserinfo​(java.lang.String rawui, java.lang.String encodingScheme)  
      static java.lang.String encHost​(java.lang.String ui, java.lang.String encodingScheme)
      Encode a string for the host part of a URI string.
      static java.lang.String encPath​(java.lang.String ui, java.lang.String encodingScheme)
      Encode a string for the path part of a URI string.
      static java.lang.String encScheme​(java.lang.String ui, java.lang.String encodingScheme)
      Encode a string for the scheme part of a URI string.
      static java.lang.String encSegment​(java.lang.String ui, java.lang.String encodingScheme)
      Encode a string for a segment part of a URI string.
      static java.lang.String encUserinfo​(java.lang.String ui, java.lang.String encodingScheme)
      Encode a string for the userinfo part of a URI string.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • encUserinfo

        public static java.lang.String encUserinfo​(java.lang.String ui,
                                                   java.lang.String encodingScheme)
        Encode a string for the userinfo part of a URI string.
        Parameters:
        ui - - uncoded input
        encodingScheme - - must be one of "UTF-8" or "US-ASCII"
        Returns:
        ui encoded (with % encodings if necessary)
      • decUserinfo

        public static java.lang.String decUserinfo​(java.lang.String rawui,
                                                   java.lang.String encodingScheme)
        Parameters:
        rawui - - encoded userinfo part
        encodingScheme - - must be one of "UTF-8" of "US-ASCII"
        Returns:
        rawui decoded (with % encodings replaced by characters they represent)
      • encScheme

        public static java.lang.String encScheme​(java.lang.String ui,
                                                 java.lang.String encodingScheme)
        Encode a string for the scheme part of a URI string.
        Parameters:
        ui - - uncoded input
        encodingScheme - - must be one of "UTF-8" or "US-ASCII"
        Returns:
        ui encoded (with % encodings if necessary)
      • decScheme

        public static java.lang.String decScheme​(java.lang.String rawui,
                                                 java.lang.String encodingScheme)
        Parameters:
        rawui - - encoded scheme part
        encodingScheme - - must be one of "UTF-8" of "US-ASCII"
        Returns:
        rawui decoded (with % encodings replaced by characters they represent)
      • encSegment

        public static java.lang.String encSegment​(java.lang.String ui,
                                                  java.lang.String encodingScheme)
        Encode a string for a segment part of a URI string.
        Parameters:
        ui - - uncoded input
        encodingScheme - - must be one of "UTF-8" or "US-ASCII"
        Returns:
        ui encoded (with % encodings if necessary)
      • decSegment

        public static java.lang.String decSegment​(java.lang.String rawui,
                                                  java.lang.String encodingScheme)
        Parameters:
        rawui - - encoded segment part
        encodingScheme - - must be one of "UTF-8" of "US-ASCII"
        Returns:
        rawui decoded (with % encodings replaced by characters they represent)
      • encHost

        public static java.lang.String encHost​(java.lang.String ui,
                                               java.lang.String encodingScheme)
        Encode a string for the host part of a URI string.
        Parameters:
        ui - - uncoded input
        encodingScheme - - must be one of "UTF-8" or "US-ASCII"
        Returns:
        ui encoded (with % encodings if necessary)
      • decHost

        public static java.lang.String decHost​(java.lang.String rawui,
                                               java.lang.String encodingScheme)
        Parameters:
        rawui - - encoded host part
        encodingScheme - - must be one of "UTF-8" of "US-ASCII"
        Returns:
        rawui decoded (with % encodings replaced by characters they represent)
      • encPath

        public static java.lang.String encPath​(java.lang.String ui,
                                               java.lang.String encodingScheme)
        Encode a string for the path part of a URI string.
        Parameters:
        ui - - uncoded input
        encodingScheme - - must be one of "UTF-8" or "US-ASCII"
        Returns:
        ui encoded (with % encodings if necessary)
      • decPath

        public static java.lang.String decPath​(java.lang.String rawui,
                                               java.lang.String encodingScheme)
        Parameters:
        rawui - - encoded path part
        encodingScheme - - must be one of "UTF-8" of "US-ASCII"
        Returns:
        rawui decoded (with % encodings replaced by characters they represent)