Erlang的类型系统和静态分析-CFANZ编程社区

以下shell 经过windows 测试，注意新建 OTP_HOME HOME 的环境变量为erlang安装目录

%dialyzer --build_plt -r "%OTP_HOME%/lib/kernel-2.15/ebin"

% 生成plt

dialyzer --build_plt -r "%OTP_HOME%/lib/erts-5.9/ebin"

dialyzer --add_to_plt --plt "%OTP_HOME%/.dialyzer_plt" -c "%OTP_HOME%/lib/kernel-2.15/ebin"

dialyzer --add_to_plt --plt "%OTP_HOME%/.dialyzer_plt" -c "%OTP_HOME%/lib/stdlib-1.18/ebin"

dialyzer --add_to_plt --plt "%OTP_HOME%/.dialyzer_plt" -c "%OTP_HOME%/lib/mnesia-4.6/ebin"

dialyzer --add_to_plt --plt "%OTP_HOME%/.dialyzer_plt" -c "%OTP_HOME%/lib/crypto-2.1/ebin"

dialyzer --add_to_plt --plt "%OTP_HOME%/.dialyzer_plt" -c "%OTP_HOME%/lib/sasl-2.2/ebin"

% dialyzer --build_plt --apps erts kernel stdlib mnesia crypto

% dialyzer --add_to_plt --apps crypto

“Erlang 是动态类型的语言，因而不能进行静态分析，所生成的文档也不包含有助于理解的类型信息”——这是惯常的看法，广为流行，而且被看作是 Erlang 在开发大型系统时的一个短板(大型系统意味着更强烈的静态分析需求和更严重的依赖文档进行沟通)。

然而 Erlang 是一个有着 20 多年历史的成熟系统，它早已发展出了一套自己的类型标注系统，不仅用来生成文档，更重要的是可以据此对源码进行静态分析，通过程序来排除一些低级的和隐藏的错误。在这方面， Erlang OTP 的源码本身及其文档就是最好的例子。在《Erlang 程序设计》的附录A部分，对于这个系统的使用已经进行了充分的说明。

需要强调的一点是在 Erlang 语言的背后还有一个活跃的社区(后者更为重要)，其 EPP 过程一直都在持续不断地推进语言本身的进化。这方面最新的成果便是：在 R13 中，将此前文档级的 @spec，@type 标注升级为语言级的 -spec，-type 标注。可以预期的一点是，在未来的版本中，这些方面仍将持续推进。

litaocheng 同学的这篇“Erlang类型及函数声明规格”，风格严谨，论述详尽，涵盖了最新的语言特性，是任何一个程序员想要开发“严肃 Erlang 程序”的必读文档。

Erlang类型及函数声明规格

Author: litaocheng
Mail: litaocheng@gmail.com
Date: 2009.6.8
Copyright: This document has been placed in the public domain.
Contents:

概述
意义
规范

类型及其定义语法
自定义类型定义
在record中使用类型声明
函数规范定义

使用dialyzer进行静态分析

生成plt
使用dialyzer分析

参考

概述

Erlang为动态语言，变量在运行时动态绑定，这对于我们获取函数的参数及返回值的类型信息具有一定的难度。为了弥补这个不足，在Erlang中我们可以通过type及spec定义数据类型及函数原型。通过这些信息，我们对函数及调用进行静态检测，从而发现一些代码中问题。同时，这些信息也便于他人了解函数接口，也可以用来生成文档。

意义

定义各种自定义数据类型
定义函数的参数及返回值
dialyzer 进行代码静态分析
edoc利用这些信息生成文档

规范

类型及其定义语法

数据类型由一系列Erlang terms组成，其有各种基本数据类型组成(如 integer() , atom() , pid() ）。Erlang预定义数据类型代表属于此类型的所有数据，比如 atom() 代表所有的atom类型的数据。

数据类型，由基本数据类型及其他自定义数据类型组成，其范围为对应数据类型的合集。比如:

atom () | ' bar ' | integer () | 42

与:

atom () | integer ()

具有相同的含义。

各种类型之间具有一定的层级关系，其中最顶层的 any() 可以代表任何Erlang类型，而最底层的 none() 表示空的数据类型。

预定义的类型及语法如下:

Type :: any () %% 最顶层类型，表示任意的Erlang term
   | none () %% 最底层类型，不包含任何term
   | pid ()
   | port ()
   | ref ()
   | [] %% nil
   | Atom
   | Binary
   | float ()
   | Fun
   | Integer
   | List
   | Tuple
   | Union
   | UserDefined    %% described in Section 2

Union :: Type1 | Type2

Atom :: atom ()
   | Erlang_Atom    %% 'foo', 'bar', ...

Binary :: binary ()    %% <<_:_ * 8>>
   | <<>>
   | << _ : Erlang_Integer >>    %% Base size
   | << _ : _ * Erlang_Integer >>    %% Unit size
   | << _ : Erlang_Integer , _ : _ * Erlang_Integer >>

Fun :: fun () %% 任意函数
   | fun (( ... ) -> Type ) %% 任意arity, 只定义返回类型
   | fun (() -> Type )
   | fun (( TList ) -> Type )

Integer :: integer ()
   | Erlang_Integer %% ..., -1, 0, 1, ... 42 ...
   | Erlang_Integer .. Erlang_Integer %% 定义一个整数区间

List :: list ( Type ) %% 格式规范的list (以[]结尾)
   | improper_list ( Type1 , Type2 ) %% Type1=contents, Type2=termination
   | maybe_improper_list ( Type1 , Type2 ) %% Type1 and Type2 as above

Tuple :: tuple ()    %% 表示包含任意元素的tuple
   | {}
   | { TList }

TList :: Type
   | Type , TList

由于 lists 经常使用，我们可以将 list(T) 简写为 [T] ，而 [T, ...] 表示一个非空的元素类型为T的规范列表。两者的区别是 [T] 可能为空，而 [T, ...] 至少包含一个元素。

'_' 可以用来表示任意类型。

请注意, list()表示任意类型的list，其等同于 [_]或[any()], 而 [] ，仅仅表示一个单独的类型即空列表。

为了方便，下面是一个内建类型列表

Built-in type	Stands for
term()	any()
bool()	'false' \| 'true'
byte()	0..255
char()	0..16#10ffff
non_neg_integer()	0..
pos_integer()	1..
neg_integer()	..-1
number()	integer() \| float()
list()	[any()]
maybe_improper_list()	maybe_improper_list(any(), any())
maybe_improper_list(T)	maybe_improper_list(T, any())
string()	[char()]
nonempty_string()	[char(),...]
iolist()	maybe_improper_list( char() \| binary() \| iolist(), binary() \| [])
module()	atom()
mfa()	{atom(),atom(),byte()}
node()	atom()
timeout()	'infinity' \| non_neg_integer()
no_return()	none()

类型定义不可重名，编译器可以进行检测。(转载注：在R13，如果采用 -type 和 -spec 标注，编译阶段会进行这种检测，然而，因为标注仍然是可选的，所以，如果没有使用标注，则不会进行检测。)

注意 : 还存在一些其他 lists 相关的内建类型，但是因为其名字较长，我们很少使用:

nonempty_maybe_improper_list ( Type ) :: nonempty_maybe_improper_list ( Type , any ())
nonempty_maybe_improper_list () :: nonempty_maybe_improper_list ( any ())

我们也可以使用record标记法来表示数据类型:

Record :: # Erlang_Atom {}
| # Erlang_Atom { Fields }

当前R13B中，已经支持record定义中的类型说明

自定义类型定义

通过前一章节的介绍，我们知道基本的类型语法为一个atom紧随一对圆括号。如果我们想第一个一个新类型，需要使用 'type' 关键字:

- type my_type () :: Type .

my_type为我们自定义的type名称，其必须为atom，Type为先前章节介绍的各种类型，其可以为内建类型定义，也可以为可见的（已经定义的）自定义数据类型。否则会编译时保错。

这样递归的类型定义，当前还不支持。

类型定义也可以参数化，我们可以在括号中包含类型，如同Erlang中变量定义，这个参数必须以大写字母开头，一个简单的例子:

- type orddict ( Key , Val ) :: [ { Key , Val } ] .

在record中使用类型声明

我们可以指定record中字段的类型，语法如下:

- record ( rec , { field1 :: Type1 , field2 , field3 :: Type3 } ) .

如果字段没有指明类型声明，那么默认为 any() . 比如，上面的record定义与此相同:

- record ( rec , { field1 :: Type1 , field2 :: any () , field3 :: Type3 } ) .

如果我们在定义record的时候，指明了初始值，类型声明必须位于初始值之后:

- record ( rec , { field1 = [] :: Type1 , field2 , field3 = 42 :: Type3 } ) $

我们可以指定record中字段的类型，语法如下::

- record ( rec , { field1 :: Type1 , field2 , field3 :: Type3 } ) .

如果字段没有指明类型声明，那么默认为 any() . 比如，上面的record定义与此相同:

- record ( rec , { field1 :: Type1 , field2 :: any () , field3 :: Type3 } ) .

如果我们在定义record的时候，指明了初始值，类型声明必须位于初始值之后:

- record ( rec , { field1 = [] :: Type1 , field2 , field3 = 42 :: Type3 } ) .

如果初始值类型与字段的类型声明不一致，会产生一个编译期错误。 filed的默认值为 'undefined' ，因此下面的来个record定义效果相同:

- record ( rec , { f1 = 42 :: integer () ,
   f2 :: float () ,
   f3 :: ' a ' | ' b ' ) .

- record ( rec , { f1 = 42 :: integer () ,
   f2 :: ' undefined ' | float () ,
   f3 :: ' undefined ' | ' a ' | ' b ' ) .

所以，推荐您在定义record时，指明初始值。

record定义后，我们可以作为一个类型来使用，其用法如下:

# rec {}

在使用recored类型时，我们也可以重新指定某个field的类型:

# rec { some_field :: Type }

没有指明的filed，类型与record定义时指明的类型相同。

函数规范定义

函数规范可以通过新引入的关键字 'spec' 来定义（摒弃了旧的 @spec 声明)。其语法如下:

- spec Module : Function ( ArgType1 , ..., ArgTypeN ) -> ReturnType .

函数的参数数目必须与函数规范定义相同，否则编译出错。

在同一个module内部，可以简化为:

- spec Function ( ArgType1 , ..., ArgTypeN ) -> ReturnType .

同时，为了便于我们生成文档，我们可以指明参数的名称:

- spec Function ( ArgName1 :: Type1 , ..., ArgNameN :: TypeN ) -> RT .

函数的spec声明可以重载。通过 ';' 来实现:

- spec foo ( pos_integer ()) -> pos_integer ()
; ( integer ()) -> integer () .

我们可以通过spec指明函数的输入和输出的某些关系:

- spec id ( X ) -> X .

但是，对于上面的spec，其对输入输出没有任何限定。我们可以对返回值增加一些类似guard的限定:

- spec id ( X ) -> X when is_subtype ( X , tuple ()) .

其表示X为一个tuple类型。目前仅仅支持 is_subtype 是唯一支持的guard。

某些情况下，有些函数是server的主循环，或者忽略返回值，仅仅抛出某个异常，我们可以使用 no_return() 作为返回值类型:

- spec my_error ( term ()) -> no_return () .
my_error ( Err ) -> erlang : throw ( { error , Err } ) .

使用dialyzer进行静态分析

我们定义了type及spec，我们可以使用 dialyzer 对代码进行静态分析，在运行之前发现很多低级或者隐藏的错误。

生成plt

为了分析我们的app或者module，我们可以生成一个plt文件（Persistent Lookup Table），其目的是为了加速我们的代码分析过程，plt内部很多类型及函数信息。

首先我们生成一个常用的plt文件, 其包含了以下lib：erts, kernel, stdlib, mnesia, crypto, sasl， ERL_TOP为erlang的安装目录，各个lib因为erlang版本不同会有所差别，我当前使用R13B(erl 5.7.1):

dialyzer --build_plt -r $ERL_TOP/lib/erts-5.7.1/ebin \
   $ERL_TOP/lib/kernel-2.13.1/ebin \
   $ERL_TOP/lib/stdlib-1.16.1/ebin \
   $ERL_TOP/lib/mnesia-4.4.9/ebin \
   $ERL_TOP/lib/crypto-1.6/ebin \
   $ERL_TOP/lib/sasl-2.1.6/ebin

经过十几分钟的的等待，生成了一个~/.dialyzer_plt文件，在生成plt时，可以通过--output_plt 指定生成的plt的名称。

我们也可以随时通过: dialyzer --add_to_plt --plt ~/.dialyzer_plt -c path_to_app 添加应用到既有plt中，也可以通过: dialyzer --remove_from_plt --plt ~/.dialyzer_plt -c path_to_app 从已有plt中删除某个应用。

例子:

% 生成plt
dialyzer --build_plt -r /usr/local/lib/erlang/lib/erts-5.7.1/ebin \
   /usr/local/lib/erlang/lib/kernel-2.13.1/ebin \
   /usr/local/lib/erlang/lib/stdlib-1.16.1/ebin \
   /usr/local/lib/erlang/lib/mnesia-4.4.9/ebin \
   /usr/local/lib/erlang/lib/crypto-1.6/ebin \
   /usr/local/lib/erlang/lib/sasl-2.1.6/ebin

% 从plt中去处crypto应用
dialyzer --remove_from_plt --plt ~/.dialyzer_plt -c /usr/local/lib/erlang/lib/crypto-1.6/ebin

% 向plt中添加crypto应用
dialyzer --add_to_plt --plt ~/.dialyzer_plt -c /usr/local/lib/erlang/lib/crypto-1.6/ebin

使用dialyzer分析

生成plt后，就可以对我们书写的应用进行静态检查了。

假设我们书写一个简单的module（spec/spec.erl):

- module ( spec ) .
- compile ([ export_all ]) .
- vsn ( ' 0.1 ' ) .

- spec index ( any () , pos_integer () , [ any ()]) -> non_neg_integer () .
index ( Key , N , TupleList ) ->
    index4 ( Key , N , TupleList , 0 ) .

index4 ( _Key , _N , [] , _Index ) -> 0 ;
index4 ( Key , N , [ H | _R ] , Index ) when element ( N , H ) =:= Key -> Index ;
index4 ( Key , N , [ _H | R ] , Index ) -> index4 ( Key , N , R , Index + 1 ) .

% correct:
%-spec fa( non_neg_integer() ) -> pos_integer().
% invalid:
- spec fa ( N :: atom () ) -> pos_integer () .
fa ( 0 ) -> 1 ;
fa ( 1 ) -> 1 ;
fa ( N ) -> fa ( N - 1 ) + fa ( N - 2 ) .

- spec some_fun () -> any () .
some_fun () ->
    L = [ { bar , 23 }, { foo , 33 } ] ,
    lists : keydelete ( 1 , foo , L ) .

编译spec.erl:

erlc +debug_info spec.erl

使用dialyzer进行分析:

dialyzer -r ./spec

显示结果:

Checking whether the PLT /home/litao/.dialyzer_plt is up-to-date... yes
Proceeding with analysis...
spec.erl:15: Invalid type specification for function 'spec':fa/1. The success typing is (non_neg_integer()) -> pos_integer()
spec.erl:22: Function some_fun/0 has no local return
spec.erl:24: The call lists:keydelete(1,'foo',L::[{'bar',23} | {'foo',33},...]) will never return since it differs in argument position 2 from the success typing arguments: (any(),pos_integer(),maybe_improper_list())
done in 0m0.29s
done (warnings were emitted)

我们可以看到,我们的fa/1函数的spec信息错误，我们进行修正:

由

- spec fa ( non_neg_integer () ) -> pos_integer () .

改为:

- spec fa ( N :: atom () ) -> pos_integer () .

some_fun中，lists:keydelete/3参数顺序进行修改:

lists : keydelete ( 1 , foo , L ) .

改为:

lists : keydelete ( foo , 1 , L ) .

重新编译，进行dialyzer分析，提示成功:

litao@litao:~/erltest$ dialyzer -r ./spec
Checking whether the PLT /home/litao/.dialyzer_plt is up-to-date... yes
Proceeding with analysis... done in 0m0.28s
done (passed successfully)

参考

[1] EEP 8,Types and function specifications (http://www.erlang.org/eeps/eep-0008.html)
[2] reRestructureText (http://docutils.sourceforge.net/docs/user/rst/quickref.html)
[3] dialyzer (http://www.erlang.org/doc/man/dialyzer.html)