Direct Upgrade from 3.0 to 3.2 is not supported. code needs to be upgraded from 3.0 to 3.1 and from 3.1 to 3.2Upgrade from 3.0 to 3.1Install the dependencies1) Unzip the filetar xvfz iiq_3.1_upgrade_dependencies.tar.gz 2) Install the offline depencies./install_dependencies3) Run the upgrade skipping the process off connecting to centos as depencies are already installedsudo yum upgrade --noplugins --disablerepo= */home/administrator/isilon-insightiq-3.1.0.0078-1.x86_64. rpm Upgrade from 3.1 to 3.2Upgrading from 3.1 to 3.2 EMC provided code in .sh format,(previous codes are in .rpm format)One of the issue we experienced using winscp to copy the file to InsightIQ VM is, md5sum values are gettingmodified while on the copy process, which causing the upgrade to fail throwing the errors like invalid checksumTo copy the file, use either ftp or copy the file to Isilon cluster and then to InsightIQ using scp protocol1) Run upgrade sudo sh install-insightiq-3.2.1.0001.sh 2) Run datastore upgrade iiq_datastore_upgrade
Wednesday, August 19, 2015
Isilon InsightIQ code upgrade from 3.0 to 3.2
Saturday, August 8, 2015
Isilon : Rolling upgrade failure
While doing code upgrade, there are cases where network interruption cause the loss on connection to the Cluster. If we are in middle of the code upgrade, there is no way reconnecting to the cluster take us to the step where the upgrade left.
In these cases, user needs verify the list of nodes upgraded completely, kill the upgrade process, and restart the upgrade, which skip the upgrade for nodes that completed and proceeds with nodes on previous versions.
Below are the troubleshooting commands could help during this scenarios.
isi update --rolling --manual - Initiates rolling upgrade, manual option ask the user for confirmation before rebooting every nodeisi update --check-only Pre upgrade health checkcd /var/logls -l update*cat update_handler_2015-07-23_14:57:10.txt log file isi_for_array -s ps awux | grep update list the running update processisi_for_array -s killall -9 update Kill the current running update processisi_for_array date dispalys date of the clusterisi_readonlyisi readonlyisi auth error 54isi_for_array -n2 killall -9 isi_upgrade_d kill the upgrade processisi_for_array -s ps awux | grep isi_for_arrayssh isceist01-2isi_for_array -s uname -a | awk '{print $4}'isi update --rolling --manualisi_for_array -s ps awux | grep upgrade_disi_for_array -s killall -9 isi_upgrade_d Kill the upgrade processisi_for_array -s ps awux | grep upgrade_d list the current running processes by upgrade filterisi update --rolling --manual Initiate rolling upgrade
Tuesday, July 28, 2015
LDAP Performance Troubleshooting - Isilon
Recently we were complained about authentication issue, where some of the users are unable to login to the Isilon cluster, where login requests are getting time out. This is kind of issue, where some users are able to login to every Isilon individual node with IP address as well as SSIP and remaining DNS addresses. On the other side few users are able to login to couple of nodes and while others not.
15) Logged in to 7K switch and started shutting off one port at a time, connecting to 5K switch and tested the logins to Isilon cluster
16) Tested with 3 ports, the logins started working after shutting down the 4 th port.
17) Verified the configuration for the port interface and found it has some CRC errors and others which eating the packets.
Below are the commands which helps towards the troubleshooting process. Use as required.
Collecting tcp dumps.
No changes performed on either side of Isilon cluster as well as LDAP server. Below are the troubleshooting steps performing to figure out the location and root cause of the issue.
1) Login to each and individual nodes separately with IP address, SSIP, and pool DNS addresses.
2) Tried with different LDAP users by all step one ways
3) Checked with LDAP team whether they are receiving the LDAP requests, where by verifying the logs as well as Splunk log repositories.
4) Restarted authentication services
5) Verified if any changes performed on Isilon cluster as well as LDAP server by checking the time when issue was started.
6) Verified if whether configuration on Isilon cluster is reflecting accross all nodes, since login to some of them working fine while others not.
7) Listed all the Physical components where the request and response flows in the network. Like
Isilon -> Nexus 5K -> Nexus 7K -> F 5 load balancers -> Nexus 7K -> Fabric Interconnect - > ESXi hosts -> Virtual LDAP machines and vice versa response from LDAP virtual machines to Isilon clusters
8) Captured network traces on Isilon cluster as well as on Nexus 7K switch while doing couple of tests to see the flow of LDAP request
I will provide the commands for taking TCP dumps on Isilon cluster and troubleshooting helpful commands at the end.
9) Used wireshark to verify the tcpdump pcap captures.
10) Once pcaps are opened in wireshark, filter the frames by decode to LDAP to minimize the output to LDAP frames
11) Decode can be performed to any type like TCP, UDP, LDAP to minimize the output to our preferred format for ease of troubleshooting.
Other ways to filter is
HTTPS is eq to "ABC"
LDAP eq to "ABC"
Where ABC is user id or any filter
right click on any frame and see the TCP flow to check complete flow happend during particular session.
Red color code indicates the request from Isilon cluster, and green represent the response from Server.
12) We opened both captures performed at Isilon cluster as well as 7K and compared same session from both pcaps.
The 7K switch showed it received the responses from LDAP servers but has lot of retransmissions and frames in red color. where Isilon pcaps missed all the responses. It just waited for 100 seconds before sending unbind request and received response and successfully closed the connection.
13) F5 engineer verified and confirmed all packets are being placed on the wire which going to 7K switch
13) F5 engineer verified and confirmed all packets are being placed on the wire which going to 7K switch
14) That way isolated, Isilon cluster and LDAP server, F5 from the issue list as both are trying to communicate, Since response reached all the way back to 7K which left 2 devices on the network
7K and 5K switches.
15) Logged in to 7K switch and started shutting off one port at a time, connecting to 5K switch and tested the logins to Isilon cluster
16) Tested with 3 ports, the logins started working after shutting down the 4 th port.
17) Verified the configuration for the port interface and found it has some CRC errors and others which eating the packets.
Below are the commands which helps towards the troubleshooting process. Use as required.
Collecting tcp dumps.
tcpdump -i vlan0 -s 0 -w /ifs/data/Isilon_Support/node2/ssh_login.pcap host 10.10.10.10
tcpdump -i vlan0 -s 0 -w /ifs/data/Isilon_Support/node2/node2_ldap.pcap host 10.10.10.10
tcpdump -i vlan0 -s 0 -w /ifs/data/Isilon_Support/node2/node2_mapping.pcap &
isi_for_array 'tcpdump -s 0 -i lagg0 -w /ifs/data/Isilon_Support/$(date +%m%d%Y)/`hostname`.$(date +%m%d%Y_%H%M%S).lagg0.cap &'
Verify active connections on Isilon cluster
isi_for_array netstat -an|grep 10.10.10.10
isi_for_array ifconfig |grep 10.10.10.10
Other commands
tail -f /var/log/lsassd.log Authentication log file
ps aux |grep lsass Current running processes
ifcofnig -a
ls -lrth
isi auth ldap list List ldap servers configured on Isilon clsuter
isi auth mapping token --user=abce --zone=1 Verify mapping information for LDAP user
isi auth mapping token --user=abcd
isi auth mapping flush Flush the cache
isi auth mapping flush --all Flush the cache
isi_for_array -n3,4,5 isi auth mapping token --user=abcd
isi_for_array -n3 isi auth mapping token --user=abcd
ldapsearch -H ldap://10.10.10.10 -b 'ou=enterprise,o=abc,c=us'
ldapsearch -H ldap://10.10.10.10 -x -b "" -s base objectclass="*" supportedControl
ldapsearch -H 10.10.10.10 -x -b "" -s base objectclass="*" supportedControl
ldapsearch -H 10.10.10.10 -x -b ,ou=enterprise,o=abc,c=xyz
ldapsearch -H 10.10.10.10 -x -b ou=enterprise,o=abc,c=xyz
ldapsearch -H ldap://10.10.10.10 -x -b ou=enterprise,o=abc,c=xyz
ldapsearch -H ldap://10.10.10.10:2389 -x -b ou=enterprise,o=abc,c=xyz
/usr/likewise/bin/lwsm list
ldapsearch -H ldap://10.10.10.10:2389 -x -b ou=enterprise,o=abc,c=xyz
ldapsearch -x LLL -H ldap://10.10.10.10:2389 -b 'ou=enterprise,o=abc,c=xyz' -D abcd
date; isi auth mapping token --user=abcd
ls -l /ifs/data/Isilon_Support/node2/node2_mapping.pcap
ls -lh /ifs/data/Isilon_Support/node2/node2_mapping.pcap
ping -c 1000 10.10.10.10 -W 1
ping -c 1000 -W 1 10.10.10.10
ping -c 1000 10.10.10.10
isi services -a
isi_for_array "ps auxww | grep lsass | grep -v grep"
ldapsearch -x -h abc.xyz.com -p 2389 -D "abcd" -W -b "" -s base "objectclass =*"
ldapsearch -x -h abc.xyz.com -p 2389 -D "abcd" -W -b "" -s base "objectclass=*"
isi_for_array "ps auxww | grep lsass | grep -v grep"
isi_for_array "isi_classic auth ads cache flush --all" Flush the cache
isi_for_array "isi_classic auth mapping flush --all" Flush the cache
isi_for_array "killall lsassd -9" Kill the lsassd authentication deamon, which whill be automatically restarted by MCP master control process
ldapsearch -h abc.xyz.ldap.com -p 2389 -D "uid=abc,ou=def,ou=enterprise,o=hij,c=abd" -W -b "ou=enterprise,o=hij,c=abd"
ldapsearch -h abc.xyz.ldapserver.com -p 2389 -D "uid=abc,ou=def,ou=enterprise,o=hij,c=abd" -W -b "ou=enterprise,o=hij,c=abd"
isi auth ldap list -v
ldapsearch -h abc.xyz.ldapserver.com -p 2389 -D "uid=abc,ou=def,ou=enterprise,o=hij,c=abd" -W -b "ou=enterprise,o=hij,c=abd" "(&(objectClass=posixAccount)(uidNumber=1234))"
ldapsearch -h abc.xyz.com -p 2389 -D "uid=abc,ou=def,ou=enterprise,o=hij,c=abd" -W -b "ou=enterprise,o=hij,c=abd" "(&(objectClass=posixAccount)(uidNumber=1234))"
isi auth mapping dump| less
isi auth mapping dump| wc -l
isi auth mapping token --user=abc
isi auth mapping token --uid 1234
less /var/log/messages
tail /var/log/messages
isi auth log-level
isi auth log-level --set=debug Set the log level to debug
isi auth log-level --set=warning Set the log level to warning
ping -c 10 abc.xyz.ldap.com
traceroute abc.xyz.ldapserver.com
isi auth status
isi status
isi auth ldap view Primary
less /var/log/lsassd.log
isi auth mapping token abcdef
isi auth users view abcdef
less /var/log/lwiod.log
less /var/log/messages
less /var/log/lsassd.log
cd /etc/openldap
ls
less ldap.conf
less ldap.conf.default
less /ifs/.ifsvar/main_config_changes.log
less /var/log/lsassd.log
isi_for_array -s isi auth ldap.conf
isi auth status
isi auth ldap view --provider-name=Primary | grep "Group Filter:" | grep "User Filter:"
isi auth ldap view --provider-name=Primary
isi_for_array -s isi auth ldap view --provider-name=Primary | grep "User Filter:"
isi_for_array -s isi auth ldap view --provider-name=Primary | grep "Group Filter:"
isi_for_array -s isi auth ldap view --provider-name=Primary | grep "User Domain:"
isi_for_array -s /usr/likewise/bin/lwsm list
isi_for_array -s ps awux | grep lw
ifconfig
isi zone zones list
isi zone zones view system
isi_for_array -s isi zone zones view system
isi networks list pools
isi networks list pools -v
exit
isi status
isi networks list pools
isi networks list pools --name=pool1
mkdir /ifs/data/Isilon_Support/$(date +%m%d%Y)
isi_for_array 'tcpdump -s 0 -i lagg0 -w /ifs/data/Isilon_Support/$(date +%m%d%Y)/`hostname`.$(date +%m%d%Y_%H%M%S).lagg0.cap &'
Friday, July 10, 2015
Isilon : Sync IQ scheduler memory leak issue
Current Isilon versions 7.* have a memory leak issue which causes the sync scheduler to run out of it's allocated 512 max memory and go into hung state. This state will stops all jobs from initializing weather incremental or full. Current code doesn't trigger any alerts during this outage until some one manually verify.
To avoid enter into the outage situation follow the below steps:
Isilon has developed a script for monitoring and trigger email alerts once sync scheduler memory utilization reaches certain threshold, so that sync process can be restarted before go into the hung state.
Below are the commands to verify the memory usage manually.
To avoid enter into the outage situation follow the below steps:
Isilon has developed a script for monitoring and trigger email alerts once sync scheduler memory utilization reaches certain threshold, so that sync process can be restarted before go into the hung state.
Below are the commands to verify the memory usage manually.
# isi_for_array -s ps awxu | grep isi_migr_sched | grep -v grep |awk '{print $1 $6}' This command give the current memory usage across all nodes in the cluster
For example, if we want to be notified when memory reaches 470 MB, script is available with EMC support. edit the threshold values to 470 MB from the script.
Once we receive the email, run the following commands to reset the memory.
isi sync settings modify --service=off
isi sync settings modify --service=on
This command will reset the memory value to around 76 MB
Note: Script from Isilon has to be executed every time the node gets rebooted.
** Permanent fix will be expected to be on Riptide version (8.0) which is expected in Q4
Friday, June 12, 2015
VMAX : Storage provision for boot LUNs
Most of the clients prefer to create boot luns as
Thick. Below provide steps to create LUNs from thick storage (Disk groups).
1) Verify Disk Group information
symdisk list -dskgrp_summary
Verify available space on disk group.
2) Create Thick LUN
There are 2 ways to create LUNs. We can use
existing LUN which already in use as reference to create new LUNs with same
confirguration
symconfigure -sid 123 -cmd "configure 2
devices copying dev 1234 overriding config=vdev;" preview/prepare/commit
-nop -v
1234 is existing LUN. I am copying same
configuration and creating 2 new LUNs
OR
symconfigure -sid 123 -cmd "create dev
count=6, size=2322 cal, emulation=FBA, data_member_count=3, config=RAID-5,
disk_group=1;" commit -nop
3) Create Storage group and add dev to group
symaccess -sid 123 create -name Boot_LUN -type stor
-dev 1234;
4) Create Port group
symaccess -sid 845 create -name Boot_LUN -type port
-dirport 3E:0, 13E:0, 4E:0, 14E:0
5) Create Initiator group
Creating Child and Parent Storage groups allows the user to use same child group nested under multiple parent groups
a) Create Child group and set
flags
Create group and add host wwn's to child group
symaccess
-sid 123 create -name IG_Child -type init -wwn 20000012345678 ;
symaccess -sid 123 -name IG_Child -type init add
-wwn 200000123456787
Adding flags C,SPC2 & consistent lun
symaccess -sid 123 -name IG_Child -type init set
ig_flags on C,SPC2 -enable ;
symaccess -sid 123 -name IG_Child -type init
set consistent_lun on ;
B) Create Parent and set flags
symaccess -sid 123 create -name IG_Parent -type init ;
Enable C, SPC2
symaccess -sid 123 -name IG_Parent -type init
set ig_flags on C,SPC2 -enable ;
Enable Consistent LUN
symaccess -sid 845 -name IG_Parent -type init
set consistent_lun on ;
C) Add child IG groups to Parent IG Groups
symaccess -sid 845 add -name IG_Parent -type
init -ig IG_Child;
6) Create Masking View
Create Masking view symaccess -sid 123 create
view -name Boot_LUN -sg Boot_LUN -pg Boot_LUN -ig IG_Parent -lun 0 ;
Isilon - Clear CE log database
Some times log files gets filled up which avoids Isilon cluster from sending alerts to either emails or call homes. Performing the following commands will free up the logs database and start sends the alerts again
There is one more case where we need to reset the CE log database. Some times quieting the old alerts throws the error saying "event database not accessible" or while doing pre-health check during code upgrades the output shows "Health check returns with warning message saying event database is not accessible". This can be resolved by clearing and restarting the CE log services and databases.
You can run all commands at once or One at a time if want to.
Create <SR number> directory under Isilon_support directory to store logs for further analysis or troubleshooting
There is one more case where we need to reset the CE log database. Some times quieting the old alerts throws the error saying "event database not accessible" or while doing pre-health check during code upgrades the output shows "Health check returns with warning message saying event database is not accessible". This can be resolved by clearing and restarting the CE log services and databases.
You can run all commands at once or One at a time if want to.
Create <SR number> directory under Isilon_support directory to store logs for further analysis or troubleshooting
mkdir -p /ifs/.ifsvar/db/celog /ifs/data/Isilon_Support/sandbox /ifs/data/Isilon_Support/ celog_backups ; mkdir /ifs/data/Isilon_Support/<SR Number> ; isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR Number>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_celog_monitor.core $(pgrep isi_celog_monitor)' ; isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR Number>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_celog_coalescer.core $(pgrep isi_celog_coalescer)' ; isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR Number>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_ celog_notification.core $(pgrep isi_celog_notifi)' ;sleep 120;
isi services -a | grep celogisi services -a celog_coalescer disable ;isi services -a celog_monitor disable ;isi services -a celog_notification disable ;isi_for_array -sX 'pkill isi_celog_';mv -vf /ifs/.ifsvar/db/celog/* /ifs/data/Isilon_Support/celog_backups/ ; isi_for_array -sX 'rm -f /var/db/celog/*.db' ;isi_for_array -sX 'rm -f /var/db/celog_master/*.db' ;rm -f /ifs/.ifsvar/db/celog/*.db ;isi services -a celog_coalescer enable ;isi services -a celog_monitor enable ;isi services -a celog_notification enable ;
Sunday, June 7, 2015
Isilon - Useful links
Below links are just for reference. Credits to the authors
Isilon integration with Avamar : https://splitbrained.wordpress.com/2014/02/19/isilon-avamar-ndmp/
Create Multi access zones on Isilon: https://storagenerd.wordpress.com/2013/02/01/how-to-setup-access-zones-for-multiple-active-directory-domains-isilon-7/
Multi access zone video demonstration: https://www.youtube.com/watch?v=hF3W8o-n-Oo
https://www.youtube.com/watch?v=R6XRJSp3mj4
Create Multi access zones on Isilon: https://storagenerd.wordpress.com/2013/02/01/how-to-setup-access-zones-for-multiple-active-directory-domains-isilon-7/
Multi access zone video demonstration: https://www.youtube.com/watch?v=hF3W8o-n-Oo
https://www.youtube.com/watch?v=R6XRJSp3mj4
Saturday, June 6, 2015
Isilon - Measuring cluster latency
CPU: isi statistics system --nodes -top
NET : isi statistics protocol --top --orderby=timeavg
ping/iperf
DISK: isi statistics drive -nall -top --long
MEMORY: isi statistics query -nall -stats=node.memory.used
NET : isi statistics protocol --top --orderby=timeavg
ping/iperf
DISK: isi statistics drive -nall -top --long
MEMORY: isi statistics query -nall -stats=node.memory.used
Isilon: measuring IOPS for drive
Recommended max IOPS rates for Isilon drives are
100 for Sata drives
200 for SAS drives
Stech Mach8 SSD drives:2600
Hitachi SSD drive: 4800
measuring iops per drive
requires root access
isi statistics query --stats=node.disk.xfers.rate.<drive #>
isi statistics query --nodes=all --stats=node.disk.xfers.rate.sum for all nodes
Measuring latency
isi statistics -nall --top --long
Disk latency
7200 RPM SATA = 8-10 ms
10000 RPM SAS = 3 ms
Infiniband latency: ~.050 milliseconds
Measuring CPU performance under load
isi statistics protocl
isi statistics protocol --orderby-TimeAvg --top
isi statistics system --top to see greater details about cpu processing
Too see load averages
sysctl vm.loadavg
sysctl vm.uptime
To display performance information for all nodes by drive:
isi statistics drive --nodes=all --orderby=timeinqueue
isi statistics client --remote-name=<IP_address>
Isilon: File locking
File locking
byte range locking: nfsv4, lock individual bytes of a file
Oplocks: SMB opportunistic locks, caching lot of locks at SMB
NLM : network lock manager for older nfs versions nfs3/2
isi statistics heat
nfsstat -c client side of nfs statistics like error count, locks
nfsstat -s server side of nfs statistics
byte range locking: nfsv4, lock individual bytes of a file
Oplocks: SMB opportunistic locks, caching lot of locks at SMB
NLM : network lock manager for older nfs versions nfs3/2
isi statistics heat
nfsstat -c client side of nfs statistics like error count, locks
nfsstat -s server side of nfs statistics
Isilon - Reports
Monitoring:
Live data sources:
isi statistics is point in time look
SNMP data - tells whether cluster is out of normal situation, cache, memory
cluster events - like drive failures , boot failures
Exporting Historic data
isi_stats_d
InsightIQ history
CELOG database:
Logs
efs.gmp.group
isi statistics system
pstat
client
protocl
query
isi statistics system --nodes --top teslls most active node
cluster wide and protocol data isi statistics pstat
isi statistics pstat --protocol=smb2 --top
nlm: network lock manager for nfs version 2 and 3
lsass_in and lsass_out authentication
isi statistics client displays most active clients accessing cluster for each protocol types
isi statistics client --orderby=ops --top
isi statistics protocol --totalby=proto,class,op,node --top
BDP: Bandwidth delay product calculates how much bandwidth available on the network. Calculating bandwidth (RFC 1323)
64KB/RTT (round trip time) = speed of NIC HW
ex 64kB/0.5 milliseconds) = 125 MBPS
Network performance measurements tools
iperf: measures TCP and UDP streaming throughput between endpoints
ping - measures RTT, latency on network wire
wireshark/Tshark allows comprehensive examination of numerious protocols, live/offline capture analysis
tcpdump: provides ability to view packets using predefined fileters, or by capturing everything for post capture filtering (wireshark or tshark)
Iperf usage:
Client: iperft -c <iperf server IP> -t 20
Server: iperf -s
workload I/O operations
Name space operations: mkdir,getattr,readdir,lookup
to see stats for reads,writes, namespace reads, and namespace writes:
isi statistics protocol --classes=read,write,namespace_read,nameshpace_write
how many operations are queued up for each disk
isi_for_array -s sysctl hw.iosched |grep total_inqueue
Live data sources:
isi statistics is point in time look
SNMP data - tells whether cluster is out of normal situation, cache, memory
cluster events - like drive failures , boot failures
Exporting Historic data
isi_stats_d
InsightIQ history
CELOG database:
Logs
efs.gmp.group
isi statistics system
pstat
client
protocl
query
isi statistics system --nodes --top teslls most active node
cluster wide and protocol data isi statistics pstat
isi statistics pstat --protocol=smb2 --top
nlm: network lock manager for nfs version 2 and 3
lsass_in and lsass_out authentication
isi statistics client displays most active clients accessing cluster for each protocol types
isi statistics client --orderby=ops --top
isi statistics protocol --totalby=proto,class,op,node --top
BDP: Bandwidth delay product calculates how much bandwidth available on the network. Calculating bandwidth (RFC 1323)
64KB/RTT (round trip time) = speed of NIC HW
ex 64kB/0.5 milliseconds) = 125 MBPS
Network performance measurements tools
iperf: measures TCP and UDP streaming throughput between endpoints
ping - measures RTT, latency on network wire
wireshark/Tshark allows comprehensive examination of numerious protocols, live/offline capture analysis
tcpdump: provides ability to view packets using predefined fileters, or by capturing everything for post capture filtering (wireshark or tshark)
Iperf usage:
Client: iperft -c <iperf server IP> -t 20
Server: iperf -s
workload I/O operations
Name space operations: mkdir,getattr,readdir,lookup
to see stats for reads,writes, namespace reads, and namespace writes:
isi statistics protocol --classes=read,write,namespace_read,nameshpace_write
how many operations are queued up for each disk
isi_for_array -s sysctl hw.iosched |grep total_inqueue
Friday, June 5, 2015
Isilon - Log analysis
CELOG coalescer Log file raw data
Each node has a set of logs
cluster wide log files eg: lsas, snapshot, dedupe
/var is unique and individual for every nodes
/var/log either 500 MB or 2 GB depending on node version
if /var/log partition reaches 95 percent full, node gets rebooted every 30 seconds
Log file locations
/var/log on each node
ls /var/log
find /var/log -name "*celog*.log" -print
Log collection isi_gather_info
Logs from specific node isi_gather_info -n <node #>
isi_gather_info -f /var/crash -s 'isi_hw_status -i'
From GUI
Clustermanagement - > diagnostics -> gather
Generic logs: eg: /var/log/messages
process specific logs eg; /var/log/lsassd.log any kind of authentication goes through lsassd.log
/var/log/isi_celog_coalescer.log
Log gather structure:
Isilon-1
Isilon-2
Isilon-3
local logs that are generic
base level files like any specific switches used
Isilon-1
varlog.tar
isi_hangdump.tar
isi_gather_info -noupload
isi_gather_info --noupload --group fs --nologs Log Group --group fs example
Commands for log file filtering
ls
less
grep
cat
common useful options
ls -l
grep -v <expression> file
less -d <file>
cat -n <files>
cd ; ls
ls | less
ls > files.txt
ls>>files.txt
The grep utility
grep -v Diskless /tmp/isistats.txt |grep SSD
ls -l
wc -l
tail, head and grep
grep and cut
sort and uniq
ls -l isi_job_history
wc -l isi_job_history
Narrow scope
tail isi_job_history |head -l
grep ^03 isi_job_history | wc -l
Extract suspected relevant data
grep ^03 isi_job_history |cut -d\ f4 | cut -d\[ -f1 |sort | uniq -c
find . -name <filename> -print
grep -i error local/messages |grep -iv cleared | cut -d: -f2- |less
Each node has a set of logs
cluster wide log files eg: lsas, snapshot, dedupe
/var is unique and individual for every nodes
/var/log either 500 MB or 2 GB depending on node version
if /var/log partition reaches 95 percent full, node gets rebooted every 30 seconds
Log file locations
/var/log on each node
ls /var/log
find /var/log -name "*celog*.log" -print
Log collection isi_gather_info
Logs from specific node isi_gather_info -n <node #>
isi_gather_info -f /var/crash -s 'isi_hw_status -i'
From GUI
Clustermanagement - > diagnostics -> gather
Generic logs: eg: /var/log/messages
process specific logs eg; /var/log/lsassd.log any kind of authentication goes through lsassd.log
/var/log/isi_celog_coalescer.log
Log gather structure:
Isilon-1
Isilon-2
Isilon-3
local logs that are generic
base level files like any specific switches used
Isilon-1
varlog.tar
isi_hangdump.tar
isi_gather_info -noupload
isi_gather_info --noupload --group fs --nologs Log Group --group fs example
Commands for log file filtering
ls
less
grep
cat
common useful options
ls -l
grep -v <expression> file
less -d <file>
cat -n <files>
cd ; ls
ls | less
ls > files.txt
ls>>files.txt
The grep utility
grep -v Diskless /tmp/isistats.txt |grep SSD
ls -l
wc -l
tail, head and grep
grep and cut
sort and uniq
ls -l isi_job_history
wc -l isi_job_history
Narrow scope
tail isi_job_history |head -l
grep ^03 isi_job_history | wc -l
Extract suspected relevant data
grep ^03 isi_job_history |cut -d\ f4 | cut -d\[ -f1 |sort | uniq -c
find . -name <filename> -print
grep -i error local/messages |grep -iv cleared | cut -d: -f2- |less
Isilon - System commands
sysctl -d
sysctl -d efs.gmp.group
sysctl -d efs.lbm.drive_space per-drive space statistics
sysctl -d
view log messages
less /var/log/messages
tail -f /var/log/messages
tail -n 50 /var/log/messages |grep group
isi device more
isi device -h
isi device --action smartfail -d 4
isi devices --action stopfail -d 4
sysctl -d efs.gmp.group
sysctl -d efs.lbm.drive_space per-drive space statistics
sysctl -d
view log messages
less /var/log/messages
tail -f /var/log/messages
tail -n 50 /var/log/messages |grep group
isi device more
isi device -h
isi device --action smartfail -d 4
isi devices --action stopfail -d 4
Isilon - Sysctl commands
sysctl commands are equivalent of registry in windows environment. Changes Unix kernel runtime parameters. sysctl commands changes one node's kernel at a time. It doesn't survive reboot. changes will be lost after reboot.
sysctl [option] <name> [=value]
sysctl -a list all sys controls
sysctl -d descritpion of sysctl
sysctl -Na
sysctl <sysctl_name>=value eg: sysctl kern.nswbuf=2048
set cluster wide isi_for_array -s sysctl <sysctl_name>=value
Temporary vs Persistent sysctl settings
Temporary sysctl changes are not applied to sysctl.conf configuration file
Persistent sysctl changes can be done by updating sysctl.conf configuration file on each node
isi_sysctl_cluster command updates each node and makes persistent
Setting persistent sysctl procedure
Back up conf file
touch /etc/mcp/override/sysctl.conf && cp /etc/mcp/override/sysctl.conf /etc/mcp/override/sysctl.conf.bkul
isi_sysctl_cluster sysctl_name=value
verify change cat /etc/mcp/override/sysctl.conf
sysctl [option] <name> [=value]
sysctl -a list all sys controls
sysctl -d descritpion of sysctl
sysctl -Na
sysctl <sysctl_name>=value eg: sysctl kern.nswbuf=2048
set cluster wide isi_for_array -s sysctl <sysctl_name>=value
Temporary vs Persistent sysctl settings
Temporary sysctl changes are not applied to sysctl.conf configuration file
Persistent sysctl changes can be done by updating sysctl.conf configuration file on each node
isi_sysctl_cluster command updates each node and makes persistent
Setting persistent sysctl procedure
Back up conf file
touch /etc/mcp/override/sysctl.conf && cp /etc/mcp/override/sysctl.conf /etc/mcp/override/sysctl.conf.bkul
isi_sysctl_cluster sysctl_name=value
verify change cat /etc/mcp/override/sysctl.conf
Monday, June 1, 2015
Cisco - Interface commands
show log nvram
show lacp inter e8/15
show cdp neigh int eth1/27
sh hardware internal erros mod 8 | diff
sh policy-map interface
PDU's
7K are dropping pdu's
show lacp inter e8/15
show cdp neigh int eth1/27
sh hardware internal erros mod 8 | diff
sh policy-map interface
PDU's
7K are dropping pdu's
Zero Space Reclaim - VMAX
Zero space reclaim is non impactful process which allows to reuse space which left after deleting files. This process runs on VMAX initiates from ESXi host.
Procedure:1. Identify the cluster and host that you want to run the reclaim on.2. Create alert suppression for the host CI for the reclaim window.3. Enable SSH and disable lockdown mode for that host by logging into the respective Vcenter.4. Get the list of data stores on that host by running following command.esxcli storage filesystem list |grep VMFS-5 |awk '{print $2}'5. Initiate the Reclaim process on a particular datastore by running following command.esxcli storage vmfs unmap -l host_name -n 12006. Wait for the prompt to come back which means the reclaim process is complete.7. Once the reclaim process on the datastore is complete, you can initiate the reclaim on a new datastore(Step 5).8. Once the reclaim process on the host is complete, disable SSH and enable lockdown mode for that host from Vcenter.Note: Do not initiate reclaim on more than 3 datastores at a time to avoid any performance impact.
Tuesday, May 26, 2015
Isilon - Set up email notifications for quota violation
Send email alert upon Isilon quota violation
isi quota quotas notifications create /ifs/TEST directory hard exceeded --action-email-address abc@example.com --holdoff 5m --email-template /ifs/home/admin/notification-templates/quota --action-email-owner yes
# isi quota quotas notifications view /ifs/TEST --type directory hard exceeded
Threshold: hard
Condition: exceeded
Schedule: -
Holdoff: 5m
ActionAlert: Yes
EmailOwner: Yes
NotifyAddress: abc@example.com
Email Template: /ifs/home/admin/notification-templates/quota
# isi quota quotas notifications list /ifs/TEST --type directory
Threshold Condition
---------------------
hard exceeded
advisory exceeded
---------------------
Total: 2
---------------------
hard exceeded
advisory exceeded
---------------------
Total: 2
Thursday, February 5, 2015
Isilon - Replication issue
isi_classic sync job report -v policy_name
isi_classic sync pol run policy_name
isi_services -a
isi_mcp start
ps auxwww |grep migr
If scheduler job got stale Follow below process
cp /ifs/.ifsvar/modules/tsm/config/source_record.xml /ifs/.ifsvar/modules/tsm/config/source_record.xml.bak
vi /ifs/.ifsvar/modules/tsm/config/souce_record.xml
Change to <pending-job> No Job Action </pending-job>
<current-owner> No Job Owner </current-owner>
egrep 'pending-job |current-owner' /ifs/.ifsvar/modules/tsm/config/source_record.xml | less
isi_classic sync pol run policy_name
isi_services -a
isi_mcp start
ps auxwww |grep migr
If scheduler job got stale Follow below process
cp /ifs/.ifsvar/modules/tsm/config/source_record.xml /ifs/.ifsvar/modules/tsm/config/source_record.xml.bak
vi /ifs/.ifsvar/modules/tsm/config/souce_record.xml
Change to <pending-job> No Job Action </pending-job>
<current-owner> No Job Owner </current-owner>
egrep 'pending-job |current-owner' /ifs/.ifsvar/modules/tsm/config/source_record.xml | less
Nexus 7K switch Troubleshooting - Port failure
CLI command history
Show cli history 50
show logging last 50
dir log:
show interface ethernet 8/15
show interface ethernet 8/15 brief
show port-channel summary interface po326
show diagnostic result module 8
show hardware internal errors module 8 | diff
show running-config int eth 8/15
show running-config int po326 membership
show module internal event-history module 8
show vlc consistency-parameters global
show run int po 326 men
show int status module 8
show port-channel database interface port-channel 326
show accounting log
Show cli history 50
show logging last 50
dir log:
show interface ethernet 8/15
show interface ethernet 8/15 brief
show port-channel summary interface po326
show diagnostic result module 8
show hardware internal errors module 8 | diff
show running-config int eth 8/15
show running-config int po326 membership
show module internal event-history module 8
show vlc consistency-parameters global
show run int po 326 men
show int status module 8
show port-channel database interface port-channel 326
show accounting log
Thursday, January 15, 2015
VAMX - Create similar devices from Disk group
Verify disk group information:
symdisk list -dskgrp_summary
symconfigure -sid 123 -cmd "configure 10 devices copying dev 1512 overriding disk_group=1;" preview -v -nop
symdisk list -dskgrp_summary
symconfigure -sid 123 -cmd "configure 10 devices copying dev 1512 overriding disk_group=1;" preview -v -nop
Wednesday, January 14, 2015
VMAX - Auto Provisioning
1) List existing thin devices
symdev list -sid 123 -all -tdev -cyl
2) Create thin devices
symconfigure -sid 123 -cmd "create dev count=8, size=1100,config=tdev,emulation=FBA;" prepare
symconfigure -sid 123 -cmd "create dev count=8, size=1100,config=tdev,emulation=FBA;"commit
3) Verify newly created thin devices
symdev list -sid 123 -tdev -unbound
4) Create data devices
symconfigure -sid 123 -cmd "create dev count=8,size=1100,config=2-way-mir, disk=group=3,emmulation=FBA,attribute=datadev
symdisk list -dskgrp_summary
symdev list -sid 123 -datadev
5) Crate thin pools
symconfigure sid 123 -cmd "create pool p, -type=thin;" prepare/commit
symcfg -sid 123 list -pool -thin
symcfg -sid 123 show -pool p1 -type thin
symcfg -sid 123 show -pool p1 -type thin -detail
6) Add data devices to pool
symconfigure -sid 123 -cmd "add dev 255:256 to pool p1,type=thin,member_state=enable;" commit
7) Bind tdev to thin pool
symconfigure -sid 123 -cmd "bind tdev 24D:250 to pool p1;" prepare/commit
symcfg list -sid 123 -tdev -range 24D:250
symdev -sid 123 list -noport
8) Pre-allocate space on Tdev
symconfigure -sid 123 -cmd "start allocate on tdev 24D:250 start_cyl=0,size=100MB;" prepare/commit
9) Create Meta when provisioning data is larger in space
symconfigure -sid 123 -cmd "form meta from dev 107, config=concatenated/striped; add dev 108 to meta 107;"
auto meta feature is disabled by default
10) Create Storage group
symaccess -sid 123 crate -name SG -type storage
Add devices to storage group
symaccess -sid 123 -name SG -type storage add devs CD:F4
11) Create Port group
symaccess -sid 123 create -name PG -type port
Add FA ports to port group
symaccess -sid 123 -name PG -type port -dirport 7F:1 add
12) Create Initiator group
symaccess -sid 123 create -name IG -type initiator
Add wwns to initiator group
symaccess -sid 123 -name IG -type initiator -wwn 5000 add
13) Create Masking view
symaccess -sid 123 create view -name MV -sg SG -pg PG -ig IG
symdev list -sid 123 -all -tdev -cyl
2) Create thin devices
symconfigure -sid 123 -cmd "create dev count=8, size=1100,config=tdev,emulation=FBA;" prepare
symconfigure -sid 123 -cmd "create dev count=8, size=1100,config=tdev,emulation=FBA;"commit
3) Verify newly created thin devices
symdev list -sid 123 -tdev -unbound
4) Create data devices
symconfigure -sid 123 -cmd "create dev count=8,size=1100,config=2-way-mir, disk=group=3,emmulation=FBA,attribute=datadev
symdisk list -dskgrp_summary
symdev list -sid 123 -datadev
5) Crate thin pools
symconfigure sid 123 -cmd "create pool p, -type=thin;" prepare/commit
symcfg -sid 123 list -pool -thin
symcfg -sid 123 show -pool p1 -type thin
symcfg -sid 123 show -pool p1 -type thin -detail
6) Add data devices to pool
symconfigure -sid 123 -cmd "add dev 255:256 to pool p1,type=thin,member_state=enable;" commit
7) Bind tdev to thin pool
symconfigure -sid 123 -cmd "bind tdev 24D:250 to pool p1;" prepare/commit
symcfg list -sid 123 -tdev -range 24D:250
symdev -sid 123 list -noport
8) Pre-allocate space on Tdev
symconfigure -sid 123 -cmd "start allocate on tdev 24D:250 start_cyl=0,size=100MB;" prepare/commit
9) Create Meta when provisioning data is larger in space
symconfigure -sid 123 -cmd "form meta from dev 107, config=concatenated/striped; add dev 108 to meta 107;"
auto meta feature is disabled by default
10) Create Storage group
symaccess -sid 123 crate -name SG -type storage
Add devices to storage group
symaccess -sid 123 -name SG -type storage add devs CD:F4
11) Create Port group
symaccess -sid 123 create -name PG -type port
Add FA ports to port group
symaccess -sid 123 -name PG -type port -dirport 7F:1 add
12) Create Initiator group
symaccess -sid 123 create -name IG -type initiator
Add wwns to initiator group
symaccess -sid 123 -name IG -type initiator -wwn 5000 add
13) Create Masking view
symaccess -sid 123 create view -name MV -sg SG -pg PG -ig IG
VMAX - Basic Commands
1) Verify free space on Vmax
symconfigure list -freespace
2) To make device not ready
symdev not-ready <symdev>
Submission of Configuration changes in 3 different ways
1) Using -CMD
symconfigure -sid 123 -cmd "delete dev 1234;" prepare
2) Using -f
symconfigure -sid 123 -file myfile commit
myfile.txt - delete dev 1234;
3) Using <<EOF
symconfigure -sid 123 -noprompt prepare
<<EOF
dissolve meta dev 002;
EOF
Query to see status after command executed
symconfigure -sid 123 query
Abort the running process
symconfigure -sid 123 abort -session-id <sessionid>
Setting Enviornmental variables
symcli
set symcli-cli-access-PAEALLEL
set symcli-wait-on-db=1
symcli -def
symcfg discover
symcfg list
Search Tdev Information
1) symdev list -cyl -sid 495 | find "TDEV"
Free space
symconfigure list -free -sid 123
Disk group information
symdisk list -sid 123 -dskgrp-summary
symconfigure verify -sid 123
symlmf query -type emclm -sid 123
symconfigure list -freespace
2) To make device not ready
symdev not-ready <symdev>
Submission of Configuration changes in 3 different ways
1) Using -CMD
symconfigure -sid 123 -cmd "delete dev 1234;" prepare
2) Using -f
symconfigure -sid 123 -file myfile commit
myfile.txt - delete dev 1234;
3) Using <<EOF
symconfigure -sid 123 -noprompt prepare
<<EOF
dissolve meta dev 002;
EOF
Query to see status after command executed
symconfigure -sid 123 query
Abort the running process
symconfigure -sid 123 abort -session-id <sessionid>
Setting Enviornmental variables
symcli
set symcli-cli-access-PAEALLEL
set symcli-wait-on-db=1
symcli -def
symcfg discover
symcfg list
Search Tdev Information
1) symdev list -cyl -sid 495 | find "TDEV"
Free space
symconfigure list -free -sid 123
Disk group information
symdisk list -sid 123 -dskgrp-summary
symconfigure verify -sid 123
symlmf query -type emclm -sid 123
Wednesday, January 7, 2015
Isilon - SNMP node information sending traps
On Isilon cluster, Lowest logical number node always sent snmp traps to SNMP server. Hardware related incidents are sent from individual node.
To find which node is acting as primary and sending traps
cat /var/log/isi_celog_coalescer.log |grep “CELOG master is” |tail -1
isi_nodes %{node}...%{internal_a}...%{internal_b} |grep <ip_address_from_last_command>
Monday, January 5, 2015
Isilon - Create Replication session policies Continuous
isi sync policies create --name=abc --action=sync --target-host=isilon.test.xxx.com --target-path=/ifs/isilon/xxx --source-root-path=/ifs/isilon/beta/xxx --description=xxx --schedule=When-Source-Modified
Subscribe to:
Posts (Atom)