一,前言
本文主要介绍hessian的解析协议,通过理解hessian协议以便于知道它的优劣,从而更好的运用它。Hessian序列化的协议可以在官方文档上找到 Hessian 2.0 Serialization Protocol
二,基本类型序列化
我们先从最简单的看起,int是如何序列化的。首先我们看官方文档定义。
# 32-bit signed integer
int ::= 'I' b3 b2 b1 b0
::= [x80-xbf] # -x10 to x3f -16到63之间
::= [xc0-xcf] b0 # -x800 to x7ff -2048到2047之间
::= [xd0-xd7] b1 b0 # -x40000 to x3ffff -262144到262143之间
其实有了定义之后就容易实现了,我们看代码实现:
class Hessian2Output
public void writeInt(int value)
throws IOException
{
int offset = _offset;
byte []buffer = _buffer;
//扩容
if (SIZE <= offset + 16) {
flushBuffer();
offset = _offset;
}
if (INT_DIRECT_MIN <= value && value <= INT_DIRECT_MAX)
//值在-x10到x3f之间,加上BC_INT_ZERO(0x90),就在区间[x80-xbf]上了。
buffer[offset++] = (byte) (value + BC_INT_ZERO);
else if (INT_BYTE_MIN <= value && value <= INT_BYTE_MAX) {
//值在-0x800到0x7ff之间,左移8位加上BC_INT_BYTE_ZERO(0xc8),第一个字节就在区间[xc0-xcf]上了。
buffer[offset++] = (byte) (BC_INT_BYTE_ZERO + (value >> 8));
buffer[offset++] = (byte) (value);
}
else if (INT_SHORT_MIN <= value && value <= INT_SHORT_MAX) {
//值在-x40000到x3ffff之间,左移16位加上BC_INT_SHORT_ZERO(0xd4),第一个字节就在区间[xd0-xd7]上了。
buffer[offset++] = (byte) (BC_INT_SHORT_ZERO + (value >> 16));
buffer[offset++] = (byte) (value >> 8);
buffer[offset++] = (byte) (value);
}
else {
buffer[offset++] = (byte) ('I');
buffer[offset++] = (byte) (value >> 24);
buffer[offset++] = (byte) (value >> 16);
buffer[offset++] = (byte) (value >> 8);
buffer[offset++] = (byte) (value);
}
_offset = offset;
}
同理,Long和Double也是如此,就不一一解释了。Boolean类型用’T’’F’来表示。
三,对象序列化
3,1 String序列化
String是我们最常用的对象,下面我们看看它的结构定义。
# UTF-8 encoded character string split into 64k chunks
string ::= x52 b1 b0 <utf8-data> string # non-final chunk
::= 'S' b1 b0 <utf8-data> # string of length
# 0-65535
::= [x00-x1f] <utf8-data> # string of length
# 0-31
::= [x30-x34] <utf8-data> # string of length
# 0-1023
从上面我们看到,当字符串小于65535的时候,会生成一个结束数据块,否者,一直遍历循环截取String,生成一个大小为65535的。
下面我们看源码实现:
class Hessian2Output
public void writeString(String value)
throws IOException
{
....//略 扩容
if (value == null) {
buffer[offset++] = (byte) 'N';
_offset = offset;
} else {
int length = value.length();
int strOffset = 0;
while (length > 0x8000) {
int sublen = 0x8000; //长度为32768,这里是由于一个unicode字符为2byte
...//略 扩容
// chunk can't end in high surrogate
char tail = value.charAt(strOffset + sublen - 1);
//块不能以高位结尾
if (0xd800 <= tail && tail <= 0xdbff)
sublen--;
buffer[offset + 0] = (byte) BC_STRING_CHUNK;
buffer[offset + 1] = (byte) (sublen >> 8);
buffer[offset + 2] = (byte) (sublen);
//3个字节,R字符串长度
_offset = offset + 3;
printString(value, strOffset, sublen);
length -= sublen;
strOffset += sublen;
}
...//略 扩容,这里代码太不规范,一直重复
if (length <= STRING_DIRECT_MAX) {
//直接写入字符串长度
buffer[offset++] = (byte) (BC_STRING_DIRECT + length);
}
else if (length <= STRING_SHORT_MAX) {
//高位与0x30相加,地位取length的低位
buffer[offset++] = (byte) (BC_STRING_SHORT + (length >> 8));
buffer[offset++] = (byte) (length);
}
else {
buffer[offset++] = (byte) ('S');
buffer[offset++] = (byte) (length >> 8);
buffer[offset++] = (byte) (length);
}
_offset = offset;
printString(value, strOffset, length);
}
}
3,2 date序列化
其实日期也是个相对时间,相对1970年的差值而已,也就是序列化的为整形而已。
# time in UTC encoded as 64-bit long milliseconds since
# epoch
date ::= x4a b7 b6 b5 b4 b3 b2 b1 b0
::= x4b b3 b2 b1 b0 # minutes since epoch
我们看到,0x4a开头的日期是带秒的,0x4b开头的日期是不带秒的。接下来看代码实现:
public void writeUTCDate(long time)
throws IOException
{
if (SIZE < _offset + 32)
flushBuffer();
int offset = _offset;
byte []buffer = _buffer;
//除以60000秒
if (time % 60000L == 0) {
// compact date ::= x65 b3 b2 b1 b0
long minutes = time / 60000L;
if ((minutes >> 31) == 0 || (minutes >> 31) == -1) {
buffer[offset++] = (byte) BC_DATE_MINUTE;
buffer[offset++] = ((byte) (minutes >> 24));
buffer[offset++] = ((byte) (minutes >> 16));
buffer[offset++] = ((byte) (minutes >> 8));
buffer[offset++] = ((byte) (minutes >> 0));
_offset = offset;
return;
}
}
buffer[offset++] = (byte) BC_DATE;
buffer[offset++] = ((byte) (time >> 56));
buffer[offset++] = ((byte) (time >> 48));
buffer[offset++] = ((byte) (time >> 40));
buffer[offset++] = ((byte) (time >> 32));
buffer[offset++] = ((byte) (time >> 24));
buffer[offset++] = ((byte) (time >> 16));
buffer[offset++] = ((byte) (time >> 8));
buffer[offset++] = ((byte) (time));
_offset = offset;
}
3.3自定义类定义序列化
序列化类规则:”C”+类名+字段长度+字段名。
对象规则:”O”+类定义引用+字段值 或者 类定义引用值小于16,(0x60+引用值)+字段值。
class-def ::= 'C' string int string*
object ::= 'O' int value*
::= [x60-x6f] value*
对象序列化需要注意,如果以前已经序列化过了,再次序列化时,会添加一个引用标志,如下:
ref ::= x51 int
引用仅仅引用list,map和对象。
public void writeObject(Object obj, AbstractHessianOutput out)
throws IOException
{
//对象引用
if (out.addRef(obj)) {
return;
}
Class<?> cl = obj.getClass();
//'C'和类名
int ref = out.writeObjectBegin(cl.getName());
//类名引用
if (ref >= 0) {
writeInstance(obj, out);
}
else if (ref == -1) {
//字段长度,字段名
writeDefinition20(out);
out.writeObjectBegin(cl.getName());
//字段序列化
writeInstance(obj, out);
}
else {
writeObject10(obj, out);
}
}
参考hessian序列化的例子:
class Car {
String color;
String model;
}
out.writeObject(new Car("red", "corvette"));
out.writeObject(new Car("green", "civic"));
---
C # object definition (#0)
x0b example.Car # type is example.Car
x92 # two fields
x05 color # color field name
x05 model # model field name
O # object def (long form)
x90 # object definition #0
x03 red # color field value
x08 corvette # model field value
x60 # object def #0 (short form)
x05 green # color field value
x05 civic # model field value
四,集合序列化
4.1 数组序列化
我们看它的协议:
list ::= x55 type value* 'Z' # variable-length list
::= 'V' type int value* # fixed-length list
::= x57 value* 'Z' # variable-length untyped list
::= x58 int value* # fixed-length untyped list
::= [x70-77] type value* # fixed-length typed list
::= [x78-7f] value* # fixed-length untyped list
协议本身很简单:总共分为6种情况。
变长list
固定长度list
变长无类型list
固定长度无类型list
固定长度(小于等于7)有类型list
固定长度(小于等于7)无类型list
下面我看看int[]的解析:
V # fixed length, typed list
x04 [int # encoding of int[] type
x92 # length = 2
x90 # integer 0
x91 # integer 1
BasicSerializer
case INTEGER_ARRAY:
{
if (out.addRef(obj))
return;
int []data = (int []) obj;
//写对象头
boolean hasEnd = out.writeListBegin(data.length, "[int");
//写数据
for (int i = 0; i < data.length; i++)
out.writeInt(data[i]);
//写结尾
if (hasEnd)
out.writeListEnd();
break;
}
4.2 Map序列化
我们看它的协议:
map ::= M type (value value)* Z
这个就更简单了,M开头,接着是类型,然后是值,最后以Z结尾。代码如下:
MapSerializer
public void writeObject(Object obj, AbstractHessianOutput out)
throws IOException
{
if (out.addRef(obj))
return;
Map map = (Map) obj;
Class cl = obj.getClass();
if (cl.equals(HashMap.class)
|| ! _isSendJavaType
|| ! (obj instanceof java.io.Serializable))
out.writeMapBegin(null);
else
out.writeMapBegin(obj.getClass().getName());
Iterator iter = map.entrySet().iterator();
while (iter.hasNext()) {
Map.Entry entry = (Map.Entry) iter.next();
out.writeObject(entry.getKey());
out.writeObject(entry.getValue());
}
out.writeMapEnd();
}
五,结尾
hessian序列化以开头一个字节做映射,根据不同的编号映射到不同的类型定义,下面是总体规则:
x00 - x1f # utf-8 string length 0-32
x20 - x2f # binary data length 0-16
x30 - x33 # utf-8 string length 0-1023
x34 - x37 # binary data length 0-1023
x38 - x3f # three-octet compact long (-x40000 to x3ffff)
x40 # reserved (expansion/escape)
x41 # 8-bit binary data non-final chunk ('A')
x42 # 8-bit binary data final chunk ('B')
x43 # object type definition ('C')
x44 # 64-bit IEEE encoded double ('D')
x45 # reserved
x46 # boolean false ('F')
x47 # reserved
x48 # untyped map ('H')
x49 # 32-bit signed integer ('I')
x4a # 64-bit UTC millisecond date
x4b # 32-bit UTC minute date
x4c # 64-bit signed long integer ('L')
x4d # map with type ('M')
x4e # null ('N')
x4f # object instance ('O')
x50 # reserved
x51 # reference to map/list/object - integer ('Q')
x52 # utf-8 string non-final chunk ('R')
x53 # utf-8 string final chunk ('S')
x54 # boolean true ('T')
x55 # variable-length list/vector ('U')
x56 # fixed-length list/vector ('V')
x57 # variable-length untyped list/vector ('W')
x58 # fixed-length untyped list/vector ('X')
x59 # long encoded as 32-bit int ('Y')
x5a # list/map terminator ('Z')
x5b # double 0.0
x5c # double 1.0
x5d # double represented as byte (-128.0 to 127.0)
x5e # double represented as short (-32768.0 to 327676.0)
x5f # double represented as float
x60 - x6f # object with direct type
x70 - x77 # fixed list with direct length
x78 - x7f # fixed untyped list with direct length
x80 - xbf # one-octet compact int (-x10 to x3f, x90 is 0)
xc0 - xcf # two-octet compact int (-x800 to x7ff)
xd0 - xd7 # three-octet compact int (-x40000 to x3ffff)
xd8 - xef # one-octet compact long (-x8 to xf, xe0 is 0)
xf0 - xff # two-octet compact long (-x800 to x7ff, xf8 is 0)
本文介绍了hessian的序列化机制,而反序列化是如何实现的呢,下文见。