HBase replication
Hi guys,
I need help with HBase replication troubleshooting.
On active master I get the following:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/host_scripts/alert_hbase_replication.py", line 123, in execute
raise Exception('The following region servers breached log/replication thresholds:\n'+'\n'.join(breached))
Exception: The following region servers breached log/replication thresholds:
Region server AAA has breached SizeOfLogQueue limit of 20. Has values of: PeerID=XXX SizeOfLogQueue=3908 TimeStampsOfLastShippedOp=Mon Jan 16 06:44:10 UTC 2023 Replication Lag=953231333
Region server BBB has breached SizeOfLogQueue limit of 20. Has values of: PeerID=XXX SizeOfLogQueue=2711 TimeStampsOfLastShippedOp=Fri Jan 27 07:31:21 UTC 2023 Replication Lag=979
Region server CCC has breached SizeOfLogQueue limit of 20. Has values of: PeerID=XXX SizeOfLogQueue=589 TimeStampsOfLastShippedOp=Mon Jan 16 06:46:26 UTC 2023 Replication Lag=953096639
Hi Raymond, tanks for quick pick-up on the question. Yes, the slaves are up and as of slow network - let me check on the monitoring, let's see if there are any bottlenecks. Will update after checks )
Hi Andrey,
Just to follow up on this, have you resolved your issue?
Hi Andrey,
My knowledge in HBase is basic hence won't be able to suggest much in this domain. Is the network I/O slow between your clusters? And also are your slave (replication target) clusters down?