Link 3 Insights Portal
Send us your feedback on how to improve Link.
It has been observed that containers may occasionally enter a state where they:
Start up
Successfully connect to RabbitMQ
Receive a prefetch count
Then stall without processing any messages
In the linked query, you can see that four initialize
containers were started as a result of KEDA scaling up, reacting to the increasing queue size.
However, these containers appear to stall immediately after initialization. They do not log any further activity or process messages until they are restarted around 02:02. Notably, one of the containers is killed by KEDA once the queue size drops below the scaling threshold, thanks to other containers successfully processing messages.
Key points from the data:
The "Count"
column in the query reflects how many log entries were recorded.
The stalled containers have 17 logs during their lifecycle.
We previously suspected that this issue might be caused by RabbitMQ quorum queues not replicating correctly across all nodes. In particular, we theorized that if a leader node shift occurred while the quorum was incomplete, Link
might be unable to handle the inconsistency, resulting in stalled behavior.
However, this theory now appears less likely.
During the timeframe of the issue, all RabbitMQ queues were fully replicated across all nodes. This suggests the root cause lies elsewhere.
Other insights: https://bizbrains.aha.io/ideas/ideas/LINKOB-I-215