Saturday, August 8, 2015

Isilon : Rolling upgrade failure

While doing code upgrade, there are cases where network interruption cause the loss on connection to the Cluster.  If we are in middle of the code upgrade, there is no way reconnecting to the cluster take us to the step where the upgrade left.

In these cases, user needs verify the list of nodes upgraded completely, kill the upgrade process, and restart the upgrade, which skip the upgrade for nodes that completed and proceeds with nodes on previous versions.

Below are the troubleshooting commands could help during this scenarios.

isi update --rolling --manual    -  Initiates rolling upgrade, manual option ask the user for confirmation before rebooting every node
isi update --check-only  Pre upgrade health check
cd /var/log  
ls -l update*
cat update_handler_2015-07-23_14:57:10.txt  log file
isi_for_array -s ps awux | grep update                list the running update process
isi_for_array -s killall -9 update                             Kill the current running update process
isi_for_array date                                                     dispalys date of the cluster
isi_readonly
isi readonly
isi auth error 54
isi_for_array -n2 killall -9 isi_upgrade_d           kill the upgrade process
isi_for_array -s ps awux | grep isi_for_array    
ssh isceist01-2
isi_for_array -s uname -a | awk  '{print $4}'
isi update --rolling --manual
isi_for_array -s ps awux | grep upgrade_d
isi_for_array -s killall -9 isi_upgrade_d             Kill the upgrade process
isi_for_array -s ps awux | grep upgrade_d      list the current running processes by upgrade filter
isi update --rolling --manual                               Initiate rolling upgrade 

1 comment:

  1. This came in handy for me today; thanks for posting... I did need to run the following though to kill the isi_upgrade_d procs.
    isi services -a isi_upgrade_d disable

    ReplyDelete