HBase replication

A Andrey Stepanchak event 2023-01-27 visibility 99 comment 3
more_vert

Hi guys,

I need help with HBase replication troubleshooting.

On active master I get the following:


Traceback (most recent call last):

File "/var/lib/ambari-agent/cache/host_scripts/alert_hbase_replication.py", line 123, in execute

raise Exception('The following region servers breached log/replication thresholds:\n'+'\n'.join(breached))

Exception: The following region servers breached log/replication thresholds:

Region server AAA has breached SizeOfLogQueue limit of 20. Has values of: PeerID=XXX SizeOfLogQueue=3908 TimeStampsOfLastShippedOp=Mon Jan 16 06:44:10 UTC 2023 Replication Lag=953231333

Region server BBB has breached SizeOfLogQueue limit of 20. Has values of: PeerID=XXX SizeOfLogQueue=2711 TimeStampsOfLastShippedOp=Fri Jan 27 07:31:21 UTC 2023 Replication Lag=979

Region server CCC has breached SizeOfLogQueue limit of 20. Has values of: PeerID=XXX SizeOfLogQueue=589 TimeStampsOfLastShippedOp=Mon Jan 16 06:46:26 UTC 2023 Replication Lag=953096639

More from Kontext
comment Comments
Raymond Raymond

Raymond access_time 2 years ago link more_vert

Hi Andrey,

My knowledge in HBase is basic hence won't be able to suggest much in this domain. Is the network I/O slow between your clusters? And also are your slave (replication target) clusters down?


A Andrey Stepanchak

Andrey access_time 2 years ago link more_vert

Hi Raymond, tanks for quick pick-up on the question. Yes, the slaves are up and as of slow network - let me check on the monitoring, let's see if there are any bottlenecks. Will update after checks )

Raymond Raymond

Raymond access_time 2 years ago link more_vert

Hi Andrey,

Just to follow up on this, have you resolved your issue? 

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts