场景
在一台服务器中运行了多个服务,在运行一段时间后(不足一天),发现某个进程占用了50%以上的内存资源,导致服务器异常
解决方案
首先在代码中开启pprof性能分析
runtime.SetBlockProfileRate(1)
	go func() {
		log.Println(http.ListenAndServe(":9080", nil))
	}()
采用在web界面查看的方式,此处监听9080端口
 打开浏览器访问web页面:http://ip:1080/debug/pprof/访问
/debug/pprof/
Types of profiles available:
Count	Profile
4320	allocs
265	    block
0	    cmdline
110	    goroutine
4320	heap
0	    mutex
0	    profile
48	    threadcreate
0	    trace
full    goroutine stack dump
Profile Descriptions:
allocs: A sampling of all past memory allocations
block: Stack traces that led to blocking on synchronization primitives
cmdline: The command line invocation of the current program
goroutine: Stack traces of all current goroutines
heap: A sampling of memory allocations of live objects. You can specify the gc GET parameter to run GC before taking the heap sample.
mutex: Stack traces of holders of contended mutexes
profile: CPU profile. You can specify the duration in the seconds GET parameter. After you get the profile file, use the go tool pprof command to investigate the profile.
threadcreate: Stack traces that led to the creation of new OS threads
trace: A trace of execution of the current program. You can specify the duration in the seconds GET parameter. After you get the trace file, use the go tool trace command to investigate the trace.
讲到内存泄漏,第一反应是去查看heap,实际上在查看时
heap profile: 718: 76326576 [829493: 1114177328] @ heap/1048576
1: 75759616 [1: 75759616] @ 0x40f2f9 0x410dc9 0x4143aa 0x685cee 0x686045 0x6ad612 0x6ad345 0xab8dca 0xab8da3 0x9cf4fc 0x99a1e9 0x9cf6bb 0x667b4c 0x99bd85 0x9b9eb3 0x7dbf03 0x7d742d 0x471801
#	0x685ced	github.com/gogf/gf/container/gmap.(*StrAnyMap).doSetWithLockCheck+0x14d	C:/Users/DELL/go/pkg/mod/github.com/gogf/gf@v1.15.4/container/gmap/gmap_hash_str_any_map.go:220
#	0x686044	github.com/gogf/gf/container/gmap.(*StrAnyMap).GetOrSetFuncLock+0xa4	C:/Users/DELL/go/pkg/mod/github.com/gogf/gf@v1.15.4/container/gmap/gmap_hash_str_any_map.go:254
#	0x6ad611	github.com/gogf/gf/os/gmlock.(*Locker).getOrNewMutex+0x51		C:/Users/DELL/go/pkg/mod/github.com/gogf/gf@v1.15.4/os/gmlock/gmlock_locker.go:130
#	0x6ad344	github.com/gogf/gf/os/gmlock.(*Locker).Lock+0x44			C:/Users/DELL/go/pkg/mod/github.com/gogf/gf@v1.15.4/os/gmlock/gmlock_locker.go:33
#	0xab8dc9	github.com/gogf/gf/os/gmlock.Lock+0x3e9					C:/Users/DELL/go/pkg/mod/github.com/gogf/gf@v1.15.4/os/gmlock/gmlock.go:19
#	0xab8da2	avrtc/app/system/admin/internal/api.(*wssApi).Init.func1+0x3c2		C:/Users/DELL/Desktop/signal/app/system/admin/internal/api/websocket.go:200
#	0x9cf4fb	github.com/gogf/gf/net/ghttp.(*middleware).Next.func1.8+0x3b		C:/Users/DELL/go/pkg/mod/github.com/gogf/gf@v1.15.4/net/ghttp/ghttp_request_middleware.go:111
#	0x99a1e8	github.com/gogf/gf/net/ghttp.niceCallFunc+0x48				C:/Users/DELL/go/pkg/mod/github.com/gogf/gf@v1.15.4/net/ghttp/ghttp_func.go:46
#	0x9cf6ba	github.com/gogf/gf/net/ghttp.(*middleware).Next.func1+0x13a		C:/Users/DELL/go/pkg/mod/github.com/gogf/gf@v1.15.4/net/ghttp/ghttp_request_middleware.go:110
#	0x667b4b	github.com/gogf/gf/util/gutil.TryCatch+0x6b				C:/Users/DELL/go/pkg/mod/github.com/gogf/gf@v1.15.4/util/gutil/gutil.go:46
#	0x99bd84	github.com/gogf/gf/net/ghttp.(*middleware).Next+0x164			C:/Users/DELL/go/pkg/mod/github.com/gogf/gf@v1.15.4/net/ghttp/ghttp_request_middleware.go:47
#	0x9b9eb2	github.com/gogf/gf/net/ghttp.(*Server).ServeHTTP+0x652			C:/Users/DELL/go/pkg/mod/github.com/gogf/gf@v1.15.4/net/ghttp/ghttp_server_handler.go:122
#	0x7dbf02	net/http.serverHandler.ServeHTTP+0xa2					C:/Program Files/Go/src/net/http/server.go:2887
#	0x7d742c	net/http.(*conn).serve+0x8cc						C:/Program Files/Go/src/net/http/server.go:1952
可以发现几乎都是框架以及系统级别的,对内存泄漏定位的意义不大,这里借用大佬的观点来阐述一下pprof对于heap的一些观点:
- 内存profiling记录的是堆内存分配的情况,以及调用栈信息,并不是进程完整的内存情况,猜测这也是在go pprof中称为heap而不是memory的原因。
- 栈内存的分配是在调用栈结束后会被释放的内存,所以并不在内存profile中。
- 内存profiling是基于抽样的,默认是每1000次堆内存分配,执行1次profile记录。
- 因为内存profiling是基于抽样和它跟踪的是已分配的内存,而不是使用中的内存,(比如有些内存已经分配,看似使用,但实际以及不使用的内存,比如内存泄露的那部分),所以不能使用内存profiling衡量程序总体的内存使用情况。
且在实际场景中,更多的是由于goroutine管理不当造成的内存泄漏,此时我们查看goroutine里的内容
goroutine profile: total 13647
11274 @ 0x43abc5 0x4068cf 0x40650b 0xab87b5 0x471801
#	0xab87b4	avrtc/app/system/admin/internal/api.notifyFxn+0x54	C:/Users/DELL/Desktop/signal/app/system/admin/internal/api/websocket.go:156
对这里的数据进行一个说明,开头的total是协程的总阻塞数,而在每一项的开头的数字,则代表了该项的协程阻塞数,可以清楚的看到
websocket.go:156,即该文件的156行就开启了11274个协程,即使我们只将一个协程按照2kb进行计算,这也是一个不容忽视的数字
解决结果
定位到行数,那么就好办了
func notifyFxn(ws *ghttp.WebSocket, c chan string, lock string, done *bool) {
	for ;*done == false; {
		notify := <-c
		glog.Line().Debugf("Notify %s\n", notify)
		gmlock.Lock(lock)
		ws.WriteMessage(2, []byte(notify))
		gmlock.Unlock(lock)
	}
}
看到156行正是由于notify := <-c通道引起的阻塞,至此,定位结束
这是pprof的一个小的实战案例,后续有实际场景再进行补充










