内容简介:最近写了一段代码,精简一下大概是这样:一个全局的变量,内容是双 buffer,然后每过一段固定的时间,从 db 里加载内容,更新到下一个缓冲区,更新完毕之后,切换索引。整个操作一气呵成,毫无破绽。然后程序竟然无征兆地崩溃了。
最近写了一段代码,精简一下大概是这样:
type config struct { fk2Opts map[string]string } type doublebuffer struct { c1 config c2 config currentIdx int64 l sync.RWMutex } var option doublebuffer func readConfig(key string) bool { option.l.RLock() defer option.l.RUnlock() idx := option.currentIdx if idx == 0 { _, ok1 := option.c1.fk2Opts[key] return ok1 } _, ok1 := option.c2.fk2Opts[key] return ok1 } func writeToConfig() { fk2Opts := map[string]string{} for i := 0; i < 10; i++ { fk2Opts[fmt.Sprint(i)] = fmt.Sprint(i) } option.l.Lock() defer option.l.Unlock() idx := option.currentIdx if idx == 0 { option.c1.fk2Opts = fk2Opts } else { option.c2.fk2Opts = fk2Opts } option.currentIdx = 1 - option.currentIdx }
一个全局的变量,内容是双 buffer,然后每过一段固定的时间,从 db 里加载内容,更新到下一个缓冲区,更新完毕之后,切换索引。整个操作一气呵成,毫无破绽。
然后程序竟然无征兆地崩溃了。
unexpected fault address 0x0 fatal error: fault [signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x484a0f] goroutine 5337 [running]: runtime.throw(0xdab912, 0x5) /usr/local/go1.9.2/src/runtime/panic.go:605 +0x95 fp=0xc4219109e8 sp=0xc4219109c8 pc=0x454485 runtime.sigpanic() /usr/local/go1.9.2/src/runtime/signal_unix.go:374 +0x227 fp=0xc421910a38 sp=0xc4219109e8 pc=0x46be87 runtime.aeshashbody() /usr/local/go1.9.2/src/runtime/asm_amd64.s:1127 +0x38f fp=0xc421910a40 sp=0xc421910a38 pc=0x484a0f runtime.mapaccess2(0xce03a0, 0xc420702ff0, 0xc421910b18, 0x0, 0xf) /usr/local/go1.9.2/src/runtime/hashmap.go:416 +0x85 fp=0xc421910ac8 sp=0xc421910a40 pc=0x42dd35 git.xiaojukeji.com/gulfstream/ufs/dict.GetFeatureOptionByFeatureKey(0x11c6e20, 0xc42068e6c0, 0x1, 0xefcec37200000000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /tmp/go-build1536727247/src/git.xiaojukeji.com/gulfstream/ufs/dict/api.go:36 +0x3bd fp=0xc421910e28 sp=0xc421910ac8 pc=0xa9df1d
同时还有一些 GC 时崩溃的日志:
runtime: nelems=170 nalloc=21 previous allocCount=20 nfreed=65535 fatal error: sweep increased allocation count
惊慌地找到了 runtime throw 的堆栈,看到了 0x0,本能地觉得应该是一个 map 的并发读写的问题,于是在线下开始找系统中所有的 map DATA RACE。修复一波,重新上线~
竟然还是崩溃。。这下是得认真看一下了。先构造一个 map 的并发读写的例子:
package main import ( "time" ) var m = map[string]string{} func main() { go func() { for { m["a"] = "1" } }() go func() { for { _ = m["b"] } }() time.Sleep(time.Minute) }
go run -race map_race.go
这样一定可以先看到有 DATA RACE,然后程序崩溃:
DATA RACE:
WARNING: DATA RACE Read at 0x00c420096060 by goroutine 6: runtime.mapaccess1_faststr() /usr/local/go/src/runtime/hashmap_fast.go:172 +0x0 main.main.func2() /Users/xargin/map_race.go:17 +0x6b Previous write at 0x00c420096060 by goroutine 5: runtime.mapassign_faststr() /usr/local/go/src/runtime/hashmap_fast.go:694 +0x0 main.main.func1() /Users/xargin/map_race.go:12 +0x6f
然后崩溃:
fatal error: concurrent map read and map write goroutine 19 [running]: runtime.throw(0x10a0940, 0x21) /usr/local/go/src/runtime/panic.go:616 +0x81 fp=0xc42003ef00 sp=0xc42003eee0 pc=0x104f141 runtime.mapaccess1_faststr(0x108d3a0, 0xc420096060, 0x109d4a2, 0x1, 0x1112c40) /usr/local/go/src/runtime/hashmap_fast.go:181 +0x706 fp=0xc42003efa0 sp=0xc42003ef00 pc=0x1032546 main.main.func2() /Users/xargin/map_race.go:17 +0x6c fp=0xc42003efe0 sp=0xc42003efa0 pc=0x107e98c runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc42003efe8 sp=0xc42003efe0 pc=0x1077951 created by main.main /Users/xargin/map_race.go:15 +0x5b
显然我们遇到的不是一个 map 并发读写的错误,而是类似内存 corruption 的错误,仔细看一下报错时的堆栈对应的 Go 源代码:
hashmap.go:416
415 alg := t.key.alg 416 hash := alg.hash(key, uintptr(h.hash0)) 417 m := uintptr(1)<<h.B - 1 418 b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize)))
asm_amd64.s
1126 // start with last (possibly overlapping) block 1127 MOVOU -128(AX)(CX*1), X8 <--- 崩溃在这里 1128 MOVOU -112(AX)(CX*1), X9 1129 MOVOU -96(AX)(CX*1), X10 1130 MOVOU -80(AX)(CX*1), X11 1131 MOVOU -64(AX)(CX*1), X12 1132 MOVOU -48(AX)(CX*1), X13 1133 MOVOU -32(AX)(CX*1), X14 1134 MOVOU -16(AX)(CX*1), X15
在计算 aes hash 的时候引用了一块无效的内存,然后就收到了操作系统发来的 SIGSEGV 信号,程序崩溃。
可这里的问题是,这里 aeshash 的对象是我们函数的输入参数,也就是 m[key] 里的这个 key,这个 string 所在的内存怎么可能为空呢?按照 Go 的设计理念,你不可能通过一个 string 类型的变量构造出一个访问非法内存地址的 case 的,所以这是一个 Go 本身的 bug。翻阅了半天 Go 的升级日志,最后从官方的日志中找到一个类似的 case:
package main import "runtime/debug" type T struct { // >= 16 bytes to avoid tiny alloc. a, b int } func main() { debug.SetGCPercent(1) for i := 0; i < 100000; i++ { m := make(map[*T]struct{}, 0) for j := 0; j < 20; j++ { // During the call to mapassign_fast64, the key argument // was incorrectly treated as a uint64. If the stack was // scanned during that call, the only pointer to k was // missed, leading to *k being collected prematurely. k := new(T) m[k] = struct{}{} } } }
可以看到,循环中的指针变量做 map 的 key 使用,会在 GC 时触发该 bug。在 1.9.2 上是必现的,
目前线上的构建系统中 1.9 版本的 Go 只有 1.9.2,所以使用该版本的同学如果不幸碰到了和我们类似的问题,可以考虑升一下级了。
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Programming From The Ground Up
Jonathan Bartlett / Bartlett Publishing / 2004-07-31 / USD 34.95
Programming from the Ground Up is an introduction to programming using assembly language on the Linux platform for x86 machines. It is a great book for novices who are just learning to program as wel......一起来看看 《Programming From The Ground Up》 这本书的介绍吧!