[BUG] Openim-rpc-msg Process Hangs
Introduction
In this article, we will discuss a bug that causes the openim-rpc-msg
process to hang under high load conditions. This bug is specific to the OpenIM Server version 3.8.3 and has been observed on Linux (AMD) systems.
Bug Description and Steps to Reproduce
The bug causes the openim-rpc-msg
process to hang frequently, resulting in a significant increase in the number of open file handles. Eventually, the process will crash with an error message indicating that the system has run out of available file descriptors.
To reproduce this bug, follow these steps:
- Deploy the OpenIM Server using the source code deployment method.
- Simulate high load conditions by sending a large number of requests to the server.
- Observe the
openim-rpc-msg
process and note that it hangs frequently. - Check the system logs for error messages indicating that the system has run out of available file descriptors.
Panic Log
The panic log for this bug is as follows:
goroutine 1445472 [runnable]:
syscall.Syscall(0xc084343438?, 0x14?, 0xc084343400?, 0x47efc8?)
/usr/local/go/src/syscall/syscall_linux.go:69 +0x25 fp=0xc0843433c8 sp=0xc084343358 pc=0x484805
syscall.Close(0xc005c33f08?)
/usr/local/go/src/syscall/zsyscall_linux_amd64.go:320 +0x25 fp=0xc0843433f8 sp=0xc0843433c8 pc=0x481f85
syscall.NetlinkRIB.func1()
/usr/local/go/src/syscall/netlink_linux.go:65 +0x25 fp=0xc084343410 sp=0xc0843433f8 pc=0x47f025
runtime.deferreturn()
/usr/local/go/src/runtime/panic.go:477 +0x31 fp=0xc084343448 sp=0xc084343410 pc=0x43c171
syscall.NetlinkRIB(0xc084343670?, 0x45532f?)
/usr/local/go/src/syscall/netlink_linux.go:114 +0x774 fp=0xc084343600 sp=0xc084343448 pc=0x47eeb4
net.interfaceTable(0x0)
/usr/local/go/src/net/interface_linux.go:17 +0x31 fp=0xc084343738 sp=0xc084343600 pc=0x5d2c71
net.interfaceAddrTable(0x0)
/usr/local/go/src/net/interface_linux.go:135 +0xd2 fp=0xc0843437a8 sp=0xc084343738 pc=0x5d33b2
net.InterfaceAddrs()
/usr/local/go/src/net/interface.go:119 +0x19 fp=0xc0843437f0 sp=0xc0843437a8 pc=0x5d2279
github.com/openimsdk/open-im-server/v3/pkg/common/cmd/log.getLocalIP()
/home/jenkins/node18/workspace/openim-server/pkg/common/cmd/log/zap.go:329 +0x17 fp=0xc084343838 sp=0xc0843437f0 pc=0xac05f7
github.com/openimsdk/open-im-server/v3/pkg/common/cmd/log.(*ZapLogger).timeEncoder(0x47c5e5?, {0x25acea0?, 0x25adea0?, 0x25d4740?}, {0x19b9f20, 0xc01baf35d8})
/home/jenkins/node18/workspace/openim-server/pkg/common/cmd/log/zap.go:353 +0xb5 fp=0xc084343900 sp=0xc084343838 pc=0xac07b5
github.com/openimsdk/open-im-server/v3/pkg/common/cmd/log.(*ZapLogger).timeEncoder-fm({0x25acea0?, 0x4122c5?, 0x25d4740?}, {0x19b9f20?, 0xc01baf35d8?})
<autogenerated>:1 +0x45 fp=0xc084343940 sp=0xc084343900 pc=0xac4685
go.uber.org/zap/zapcore.consoleEncoder.EncodeEntry({0x2528300?}, {0xff, {0xc1ebc4370d2f3e55, 0x7caac564c1, 0x25d4740}, {0x0, 0x0}, {0xc06be700c0, 0x33}, {0x1, ...}, ...}, ...)
/home/jenkins/go/pkg/mod/go.uber.org/zap@v1.24.0/zapcore/console_encoder.go:81 +0xbd fp=0xc084343a40 sp=0xc084343940 pc=0xa6fe3d
github.com/openimsdk/open-im-server/v3/pkg/common/cmd/log.(*alignEncoder).EncodeEntry(0xc00066ea00, {0xff, {0xc1ebc4370d2f3e55, 0x7caac564c1, 0x25d4740}, {0x0, 0x0}, {0xc06be700c0, 0x33}, {0x1, ...}, ...}, ...)
/home/jenkins/node18/workspace/openim-server/pkg/common/cmd/log/encoder.go:35 +0x10d fp=0xc084343b08 sp=0xc084343a40 pc=0xabe52d
go.uber.org/zap/zapcore.(*ioCore).Write(0xc0004dcb10, {0xff, {0xc1ebc4370d2f3e55, 0x7caac564c1, 0x25d4740}, {0x0, 0x0}, {0xc06be70080, 0x33}, {0x1, ...}, ...}, ...)
/home/jenkins/go/pkg/mod/go.uber.org/zap@v1.24.0/zapcore/core.go:95 +0x7b fp=0xc084343bd8 sp=0xc084343b08 pc=0xa7125b
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc07f7f1ee0, {0xc07e2ec800, 0x4, 0x8})
/home/jenkins/go/pkg/mod/go.uber.org/zap@v1.24.0/zapcore/entry.go:255 +0x1dc fp=0xc084343d68 sp=0xc084343bd8 pc=0xa7373c
go.uber.org/zap.(*SugaredLogger).log(0xc000112cf8, 0xff, {0xc06be70080?, 0x33?}, {0x0?, 0x0?, 0x0?}, {0xc07dea9100, 0x8, 0x8})
/home/jenkins/go/pkg/mod/go.uber.org/zap@v1.24.0/sugar.go:295 +0xec fp=0xc084343da8 sp=0xc084343d68 pc=0xab9a0c
go.uber.org/zap.(*SugaredLogger).Debugw(...)
/home/jenkins/go/pkg/mod/go.uber.org/zap@v1.24.0/sugar.go:204
github.com/openimsdk/open-im-server/v3/pkg/common/cmd/log.(*ZapLogger).Debug(0xc0004709a0, {0x19a9800?, 0xc0667b9d10?}, {0x177d9f9, 0x2a}, {0xc00843eec0?, 0x2607380?, 0x7f7f703a4008?})
/home/jenkins/node18/workspace/openim-server/pkg/common/cmd/log/zap.go:403 +0xe5 fp=0xc084343e28 sp=0xc084343da8 pc=0xac0e05
github.com/openimsdk/open-im-server/v3/pkg/common/cmd/log.ZDebug(...)
/home/jenkins/node18/workspace/openim-server/pkg/common/cmd/log/zap.go:99
github.com/openimsdk/open-im-server/v3/pkg/rpccache.(*ConversationLocalCache).GetConversation(0xc0019a4580, {0x19a9800?, 0xc0667b9d10}, {0xc024c1b420, 0xc}, {0xc04d265050, 0x23})
/home/jenkins/node18/workspace/openim-server/pkg/rpccache/conversation.go:84 +0x182 fp=0xc084343f10 sp=0xc084343e28 pc=0x12534a2
github.com/openimsdk/open-im-server/v3/pkg/rpccache.(*ConversationLocalCache).GetConversations.func1()
/home/jenkins/node18/workspace/openim-server/pkg/rpccache/conversation.go:119 +0x3f fp=0xc084343f78 sp=0xc084343f10 pc=0x125<br/>
**Q&A: openim-rpc-msg process hangs**
=====================================
**Q: What is the cause of the openim-rpc-msg process hanging?**
---------------------------------------------------------
A: The cause of the openim-rpc-msg process hanging is due to a bug in the OpenIM Server version 3.8.3. The bug causes the process to hang frequently under high load conditions, resulting in a significant increase in the number of open file handles.
**Q: What are the symptoms of this bug?**
-----------------------------------------
A: The symptoms of this bug include:
* The openim-rpc-msg process hanging frequently
* A significant increase in the number of open file handles
* The system running out of available file descriptors
* The process crashing with an error message indicating that the system has run out of available file descriptors
**Q: How can I reproduce this bug?**
--------------------------------------
A: To reproduce this bug, follow these steps:
1. Deploy the OpenIM Server using the source code deployment method.
2. Simulate high load conditions by sending a large number of requests to the server.
3. Observe the openim-rpc-msg process and note that it hangs frequently.
4. Check the system logs for error messages indicating that the system has run out of available file descriptors.
**Q: What is the impact of this bug on the system?**
------------------------------------------------
A: The impact of this bug on the system is significant. The bug causes the openim-rpc-msg process to hang frequently, resulting in a significant increase in the number of open file handles. This can lead to the system running out of available file descriptors, causing the process to crash.
**Q: How can I fix this bug?**
---------------------------
A: To fix this bug, you can try the following:
1. Upgrade to a newer version of the OpenIM Server.
2. Apply the latest patches to the OpenIM Server.
3. Increase the number of available file descriptors on the system.
4. Optimize the system configuration to reduce the load on the openim-rpc-msg process.
**Q: What are the best practices for preventing this bug?**
---------------------------------------------------------
A: The best practices for preventing this bug include:
1. Regularly upgrading to the latest version of the OpenIM Server.
2. Applying the latest patches to the OpenIM Server.
3. Optimizing the system configuration to reduce the load on the openim-rpc-msg process.
4. Monitoring the system logs for error messages indicating that the system has run out of available file descriptors.
**Q: What are the consequences of not fixing this bug?**
---------------------------------------------------
A: The consequences of not fixing this bug include:
1. The openim-rpc-msg process hanging frequently, resulting in a significant increase in the number of open file handles.
2. The system running out of available file descriptors, causing the process to crash.
3. Data loss or corruption due to the process crashing.
4. Downtime and loss of productivity due to the process crashing.