We have 3 Looker AWS Instances behind an ELB, whose health check is simply TCP:9999.
We had to roll the servers today, one at a time.
When all the servers were up and running and the ELB was happily sending requests to all 3 instances, we started noticing odd behaviors.
The issue was that from looker’s point of view there were TWO instances in the cluster, not 3. The “rogue” instance’s looker java process was happily running (and thus it had its 9999 port open) and it was happily taking requests from the ELB.
It looks like we just bumped into the class is “split brain” case, where we had isolated looker instances that were not part of the cluster.
How do we:
- programmatically detect this.
- prevent this from happening.
This is using version 5 of looker.