com.devexperts.qd.kit
Class PentaCodec

java.lang.Object
  extended by com.devexperts.qd.kit.PentaCodec
All Implemented Interfaces:
SymbolCodec

public class PentaCodec
extends java.lang.Object
implements SymbolCodec

The PentaCodec performs symbol coding and serialization using extensible 5-bit encoding. The eligible characters are assigned penta codes (either single 5-bit or double 10-bit) according to the following table:

 'A' to 'Z'                 - 5-bit pentas from 1 to 26
 '.'                        - 5-bit penta 27
 '/'                        - 5-bit penta 28
 '$'                        - 5-bit penta 29
 ''' and '`'                - none (ineligible characters)
 ' ' to '~' except above    - 10-bit pentas from 960 to 1023
 all other                  - none (ineligible characters)
 
The 5-bit penta 0 represents empty space and is eligible only at the start. The 5-bit pentas 30 and 31 are used as a transition mark to switch to 10-bit pentas. The 10-bit pentas from 0 to 959 do not exist as they collide with 5-bit pentas.

The individual penta codes for character sequence are packed into 64-bit value from high bits to low bits aligned to the low bits. This allows representation of up to 35-bit penta-coded character sequences. If some symbol contains one or more ineligible characters or does not fit into 35-bit penta, then it is not subject to penta-coding and is left as a string. The resulting penta-coded value can be serialized as defined below or encoded into the 32-bit cipher if possible. Please note that penta code 0 is a valid code as it represents empty character sequence - do not confuse it wich cipher value 0, which means 'void' or 'null'.

The following table defines used serial format (the first byte is given in bits with 'x' representing payload bit; the remaining bytes are given in bit count):

 0xxxxxxx  8x - for 15-bit pentas
 10xxxxxx 24x - for 30-bit pentas
 110xxxxx ??? - reserved (payload TBD)
 1110xxxx 16x - for 20-bit pentas
 11110xxx 32x - for 35-bit pentas
 111110xx ??? - reserved (payload TBD)
 11111100 zzz - for UTF-8 string with length in bytes
 11111101 zzz - for CESU-8 string with length in characters
 11111110     - for 0-bit penta (empty symbol)
 11111111     - for void (null)
 
See CESU-8 for format basics and IOUtil.writeUTFString(java.io.DataOutput, java.lang.String) and IOUtil.writeCharArray(java.io.DataOutput, char[]) for details of string encoding.


Field Summary
 
Fields inherited from interface com.devexperts.qd.SymbolCodec
VALID_CIPHER
 
Constructor Summary
PentaCodec()
           
 
Method Summary
 java.lang.String decode(int cipher)
          Returns decoded symbol for specified cipher.
 java.lang.String decode(int cipher, java.lang.String symbol)
          Returns decoded symbol for specified cipher-symbol pair.
 int decodeCharAt(int cipher, int i)
          Decodes one character from the given cipher at the given position.
protected  long decodeCipher(int cipher)
          Decodes cipher into penta code.
 int encode(char[] chars, int offset, int length)
          Returns encoded cipher for specified symbol represented in a character array.
 int encode(java.lang.String symbol)
          Returns encoded cipher for specified symbol.
protected  int encodePenta(long penta, int plen)
          Encodes penta into cipher.
protected  int getChartAt(long penta, int i)
           
 int getWildcardCipher()
          Returns cipher that is used by the "wildcard" symbol, this implementation returns value that is equal to encode("*").
 int readSymbol(com.devexperts.io.BufferedInput in, char[] buffer, java.lang.String[] result)
          Reads symbol from specified input stream and returns it in several ways depending on it's encodability and length.
 void readSymbol(java.io.DataInput in, SymbolReceiver receiver)
          Reads symbol from specified data intput and passes it into specified receiver.
protected  java.lang.String toString(long penta)
          Converts penta into string.
 void writeSymbol(java.io.DataOutput out, int cipher, java.lang.String symbol)
          Writes specified symbol into specified data output.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PentaCodec

public PentaCodec()
Method Detail

encodePenta

protected int encodePenta(long penta,
                          int plen)
Encodes penta into cipher. Shall return 0 if encoding impossible. The specified penta must be valid (no more than 35 bits).


decodeCipher

protected long decodeCipher(int cipher)
Decodes cipher into penta code. The specified cipher must not be 0. The returning penta code must be valid (no more than 35 bits).

Throws:
java.lang.IllegalArgumentException - if specified cipher could not be decoded.

toString

protected java.lang.String toString(long penta)
Converts penta into string. The specified penta must be valid (no more than 35 bits).


getChartAt

protected int getChartAt(long penta,
                         int i)

encode

public int encode(java.lang.String symbol)
Description copied from interface: SymbolCodec
Returns encoded cipher for specified symbol. Returns 0 if specified symbol is null or is unencodeable.

Specified by:
encode in interface SymbolCodec

encode

public int encode(char[] chars,
                  int offset,
                  int length)
Description copied from interface: SymbolCodec
Returns encoded cipher for specified symbol represented in a character array. Returns 0 if specified symbol is unencodeable. This method must produce the same result as the following code:
encode(new String(chars, offset, length));.

Specified by:
encode in interface SymbolCodec

decode

public java.lang.String decode(int cipher)
Description copied from interface: SymbolCodec
Returns decoded symbol for specified cipher. Returns null if specified cipher is 0.

Specified by:
decode in interface SymbolCodec

decode

public java.lang.String decode(int cipher,
                               java.lang.String symbol)
Description copied from interface: SymbolCodec
Returns decoded symbol for specified cipher-symbol pair. This is a shortcut with the following implementation:
return symbol != null ? symbol : decode(cipher);.

Specified by:
decode in interface SymbolCodec

decodeCharAt

public int decodeCharAt(int cipher,
                        int i)
Description copied from interface: SymbolCodec
Decodes one character from the given cipher at the given position. This method should be used only when few (one or two) characters are needed.

Specified by:
decodeCharAt in interface SymbolCodec
Returns:
Decoded characher or -1 if i >= decode(cipher).length().

getWildcardCipher

public int getWildcardCipher()
Returns cipher that is used by the "wildcard" symbol, this implementation returns value that is equal to encode("*").

Specified by:
getWildcardCipher in interface SymbolCodec

writeSymbol

public void writeSymbol(java.io.DataOutput out,
                        int cipher,
                        java.lang.String symbol)
                 throws java.io.IOException
Description copied from interface: SymbolCodec
Writes specified symbol into specified data output.

Specified by:
writeSymbol in interface SymbolCodec
Throws:
java.io.IOException - as specified data output does.

readSymbol

public void readSymbol(java.io.DataInput in,
                       SymbolReceiver receiver)
                throws java.io.IOException
Description copied from interface: SymbolCodec
Reads symbol from specified data intput and passes it into specified receiver.

Specified by:
readSymbol in interface SymbolCodec
Throws:
java.io.IOException - as specified data input does.

readSymbol

public int readSymbol(com.devexperts.io.BufferedInput in,
                      char[] buffer,
                      java.lang.String[] result)
               throws java.io.IOException
Description copied from interface: SymbolCodec
Reads symbol from specified input stream and returns it in several ways depending on it's encodability and length. If this method returns: Both buffer and result parameters shall be local to the ongoing operation; they are used for local storage and to return result to the caller; they are not used by the codec after that. The buffer shall be large enough to accomodate common short symbols, for example 64 characters. The result shall have length 1 because only first element is ever used.

Specified by:
readSymbol in interface SymbolCodec
Parameters:
in - the source to read from
buffer - char buffer to use for reading operation and to return short non-encodable symbols
result - array used to return long non-encodable symbols
Returns:
result type code as described in the method above
Throws:
java.io.IOException - if an I/O error occurs