LZW编解码算法-CFANZ编程社区

一、LZW编码原理

LZW的编码思想是不断地从字符流中提取新的字符串，通俗地理解为新 “ 词条 ” ，然后用“ 代号 ” 也就是码字表示这个 “ 词条 ” 。这样一来，对字符流的编码就变成了用码字去替换字符流，生成码字流，从而达到压缩数据的目的。LZW 编码是围绕称为词典的转换表来完成的。LZW 编码器通过管理这个词典完成输入与输出之间的转换。 LZW 编码器的输入是字符流，字符流可以是用8 位 ASCII字符组成的字符串，而输出是用 n 位 ( 例如 12 位 ) 表示的码字流。

二：编码算法步骤

步骤 1 ：将词典初始化为包含所有可能的单字符，当前前缀 P 初始化为空。
步骤 2 ：当前字符 C= 字符流中的下一个字符。
步骤 3 ：判断 P ＋ C 是否在词典中
（ 1 ）如果 “ 是 ” ，则用 C 扩展 P ，即让 P=P ＋ C ，返回到步骤 2 。
（2）如果 “ 否 ” ，则
输出与当前前缀 P 相对应的码字 W ；
将 P ＋ C 添加到词典中；
令 P=C ，并返回到步骤 2
编码算法函数实现

三：LZWEncode函数

void LZWEncode( FILE *fp, BITFILE *bf){//编码过程
   int character;
   int string_code;
   int index;
   unsigned long file_length;

   fseek( fp, 0, SEEK_END);
   file_length = ftell( fp);
   fseek( fp, 0, SEEK_SET);//计算需要编码文件的长度大小，并将指针归回原点
   BitsOutput( bf, file_length, 4*8);//调用BitsOutput函数，指针为bf，长度为file_length,count为32
   InitDictionary();//初始化词典
   string_code = -1;//初值为-1 方便在第一次判断读取是否为单个字符
   while( EOF!=(character=fgetc( fp))){
       //从fp中逐个读取出字符
       index = InDictionary( character, string_code);//前读取的字符和它的前一个字符送入Indictionary中查找词典中是否有该字符串
       //返回值sibling赋给index
       if( 0<=index){   // string+character in dictionary
           string_code = index;
       }else{   // string+character not in dictionary
           output( bf, string_code);//先直接向编码之后的bf所对应的文件中输出这个字符
           if( MAX_CODE > next_code){   // free space in dictionary
               // add string+character to dictionary
               AddToDictionary( character, string_code);//将新的字符串加入词典中
           }
           string_code = character;//将前缀字符置为character 再接着往下读
       }
   }
   output( bf, string_code);
}

四：LZW解码原理和实现算法

LZW 解码算法开始时，译码词典和编码词典相同，包含所有可能的前缀根。
解码算法步骤

步骤 1 ：在开始译码时词典包含所有可能的前缀根。
步骤 2 ：令 CW ： = 码字流中的第一个码字。
步骤 3 ：输出当前缀 - 符串 string.CW 到码字流。
步骤 4 ：先前码字 PW ： = 当前码字 CW 。
步骤 5 ：当前码字 CW ： = 码字流的下一个码字。
步骤 6 ：判断当前缀 - 符串 string.CW 是否在词典中。
（ 1 ）如果 ” 是 ” ，则把当前缀 - 符串 string.CW 输出到字符流。
当前前缀 P ： = 先前缀 - 符串 string.PW 。
当前字符 C ： = 当前前缀 - 符串 string.CW 的第一个字符。
把缀 - 符串 P+C 添加到词典。
（2）如果 ” 否 ” ，则当前前缀 P ： = 先前缀 - 符串 string.PW 。
当前字符 C ： = 当前缀 - 符串 string.CW 的第一个字符。
输出缀 - 符串 P+C 到字符流 , 然后把它添加到词典中。
步骤 7 ：判断码字流中是否还有码字要译。
（ 1 ）如果 ” 是 ” ，就返回步骤 4 。
（2）如果 ” 否 ” ，结束。
解码算法函数实现

LZWDecode函数（对代码理解见注释）：

void LZWDecode( BITFILE *bf, FILE *fp){
   //需填充
   int character;
   int new_code, last_code;//new_code是cw 码字流中的第一个码字,last_code是pw
   int phrase_length;//每次解码的长度
   unsigned long file_length = 0;//初始化文件长度
   file_length = BitsInput(bf, 4 * 8);//读出需要解码的文件的长度
   if (-1 == file_length)
       file_length = 0;
   InitDictionary();//初始化解码词典
   last_code = -1;//第一个码字cw前无字符为空置为-1 则第一次直接输出cw
   while (0 < file_length) {
       new_code = input(bf);
       if (new_code >= next_code) {//判断除第一次外之后的每一次循环更新的cw是否在词典中
           //new_code>=next_code则说明不在词典中
           d_stack[0] = character;//将character先写入堆栈中
           phrase_length = DecodeString(1, last_code);//调用DecodeString函数解码得到字符
       }
       else {//在词典中
           phrase_length = DecodeString(0, new_code);//直接解码得到字符
       }
       character = d_stack[phrase_length - 1];//更新堆栈
       while (0 < phrase_length) {//将倒序存入堆栈的字符串逆序输出到文件中
           phrase_length--;
           fputc(d_stack[phrase_length], fp);
           file_length--;
       }
       if (MAX_CODE > next_code) {//判定词典中是否还有剩余空间来继续编新的词典
           AddToDictionary(character, last_code);
       }
       last_code = new_code;//更新词典
   }
}
DecodeString函数（对代码理解见注释）：

int DecodeString( int start, int code){
   //用来计算每次解码长度phrase_length的值，并将解码后的字符倒序存入d_stack中
   //需填充
   int count;
   count = start;
   while (0 <= code) {
       d_stack[count] = dictionary[code].suffix;//d_stack是为解压缩而定义的堆栈
       code = dictionary[code].parent;//找到母节点对应长度，循环直到没有母节点
       count++;
   }
   return count;//count即为解码长度大小
}

五：对文件进行编解码

六：不同格式文件压缩效率

通过对比编码前后的文件大小可得，PNG、JFIF、JPG、JPF、DOCX文件编码后变大，而TGA、RGB、YUV、DB、SRT文件编码后变小。可能的原因的前者文件组成较为复杂、重复率小，编码压缩效果差；后者文件结构简单、重复率高，编码压缩效果较好。