Possible "zombie" state for containers connected to rabbitmq

It has been observed that containers may occasionally enter a state where they:

Start up
Successfully connect to RabbitMQ
Receive a prefetch count
Then stall without processing any messages

In the linked query, you can see that four initialize containers were started as a result of KEDA scaling up, reacting to the increasing queue size.

However, these containers appear to stall immediately after initialization. They do not log any further activity or process messages until they are restarted around 02:02. Notably, one of the containers is killed by KEDA once the queue size drops below the scaling threshold, thanks to other containers successfully processing messages.

Key points from the data:

The "Count" column in the query reflects how many log entries were recorded.
The stalled containers have 17 logs during their lifecycle.

Previous Hypothesis

We previously suspected that this issue might be caused by RabbitMQ quorum queues not replicating correctly across all nodes. In particular, we theorized that if a leader node shift occurred while the quorum was incomplete, Link might be unable to handle the inconsistency, resulting in stalled behavior.

However, this theory now appears less likely.

Current Observation

During the timeframe of the issue, all RabbitMQ queues were fully replicated across all nodes. This suggests the root cause lies elsewhere.

bizbrains_vans_...

Attach files

Enter a subject

Admin

Mikkel B. Mikkelsen

Sep 25, 2025

Other insights: https://bizbrains.aha.io/ideas/ideas/LINKOB-I-215

Reply
Hide replies

Please enter your email address

RELATED INSIGHTS

Possible "zombie" state for containers connected to rabbitmq

Previous Hypothesis

Current Observation