Wednesday, August 19, 2015

Isilon InsightIQ code upgrade from 3.0 to 3.2



Direct Upgrade from 3.0 to 3.2 is not supported. code needs to be upgraded from 3.0 to 3.1 and from 3.1 to 3.2
Upgrade from 3.0 to 3.1
Install the dependencies
1) Unzip the file
tar xvfz iiq_3.1_upgrade_dependencies.tar.gz
2) Install the offline depencies
./install_dependencies
3) Run the upgrade skipping the process off connecting to centos as depencies are already installed
sudo yum upgrade --noplugins --disablerepo= */home/administrator/isilon-insightiq-3.1.0.0078-1.x86_64.rpm
Upgrade from 3.1 to 3.2
Upgrading from 3.1 to 3.2 EMC provided code in .sh format,(previous codes are in .rpm format)
One of the issue we experienced using winscp to copy the file to InsightIQ VM is, md5sum values are getting
modified while on the copy process, which causing the upgrade to fail throwing the errors like invalid checksum
To copy the file, use either ftp or copy the file to Isilon cluster and then to InsightIQ using scp protocol
1) Run upgrade sudo sh install-insightiq-3.2.1.0001.sh
2) Run datastore upgrade iiq_datastore_upgrade

Saturday, August 8, 2015

Isilon : Rolling upgrade failure

While doing code upgrade, there are cases where network interruption cause the loss on connection to the Cluster.  If we are in middle of the code upgrade, there is no way reconnecting to the cluster take us to the step where the upgrade left.

In these cases, user needs verify the list of nodes upgraded completely, kill the upgrade process, and restart the upgrade, which skip the upgrade for nodes that completed and proceeds with nodes on previous versions.

Below are the troubleshooting commands could help during this scenarios.

isi update --rolling --manual    -  Initiates rolling upgrade, manual option ask the user for confirmation before rebooting every node
isi update --check-only  Pre upgrade health check
cd /var/log  
ls -l update*
cat update_handler_2015-07-23_14:57:10.txt  log file
isi_for_array -s ps awux | grep update                list the running update process
isi_for_array -s killall -9 update                             Kill the current running update process
isi_for_array date                                                     dispalys date of the cluster
isi_readonly
isi readonly
isi auth error 54
isi_for_array -n2 killall -9 isi_upgrade_d           kill the upgrade process
isi_for_array -s ps awux | grep isi_for_array    
ssh isceist01-2
isi_for_array -s uname -a | awk  '{print $4}'
isi update --rolling --manual
isi_for_array -s ps awux | grep upgrade_d
isi_for_array -s killall -9 isi_upgrade_d             Kill the upgrade process
isi_for_array -s ps awux | grep upgrade_d      list the current running processes by upgrade filter
isi update --rolling --manual                               Initiate rolling upgrade 

Tuesday, July 28, 2015

LDAP Performance Troubleshooting - Isilon

Recently we were complained about authentication issue, where some of the users are unable to login to the Isilon cluster, where login requests are getting time out. This is kind of issue, where some users are able to login to every Isilon individual node with IP address as well as SSIP and remaining DNS addresses. On the other side few users are able to login to couple of nodes and while others not.
No changes performed on either side of Isilon cluster as well as LDAP server.  Below are the troubleshooting steps performing to figure out the location and root cause of the issue.

1) Login to each and individual nodes separately with IP address, SSIP, and pool DNS addresses.
2) Tried with different LDAP users by all step one ways
3) Checked with LDAP team whether they are receiving the LDAP requests, where by verifying the logs as well as Splunk log repositories.
4) Restarted authentication services
5) Verified if any changes performed on Isilon cluster as well as LDAP server by checking the time when issue was started.
6) Verified if whether configuration on Isilon cluster is reflecting accross all nodes, since login to some of them working fine while others not.
7) Listed all the Physical components where the request and response flows in the network. Like
Isilon -> Nexus 5K -> Nexus 7K -> F 5 load balancers -> Nexus 7K -> Fabric Interconnect - > ESXi hosts  -> Virtual LDAP machines and vice versa response from LDAP virtual machines to Isilon clusters
8) Captured network traces on Isilon cluster as well as on Nexus 7K switch while doing couple of tests to see the flow of LDAP request
I will provide the commands for taking TCP dumps on Isilon cluster and troubleshooting helpful commands at the end.

9) Used wireshark to verify the tcpdump pcap captures.

10) Once pcaps are opened in wireshark, filter the frames by decode to LDAP to minimize the output to LDAP frames 

11) Decode can be performed to any type like TCP, UDP, LDAP to minimize the output to our preferred format for ease of troubleshooting.

Other ways to filter is 
HTTPS is eq to "ABC"
LDAP eq to "ABC"

Where ABC is user id or any filter

right click on any frame and see the TCP flow to check complete flow happend during particular session.  

Red color code indicates the request from Isilon cluster, and green represent the response from Server.

12) We opened both captures performed at Isilon cluster as well as 7K and compared same session from both pcaps.

The 7K switch showed it received the responses from LDAP servers but has lot of retransmissions and frames in red color.  where Isilon pcaps missed all the responses.  It just waited for 100 seconds before sending unbind request and received response and successfully closed the connection.

13) F5 engineer verified and confirmed all packets are being placed on the wire which going to 7K switch

14) That way isolated, Isilon cluster and LDAP server, F5 from the issue list as both are trying to communicate, Since response reached all the way back to 7K which left 2 devices on the network

7K and 5K switches.

15) Logged in to 7K switch and started shutting off one port at a time, connecting to 5K switch and tested the logins to Isilon cluster

16) Tested with 3 ports, the logins started working after shutting down the 4 th port.

17) Verified the configuration for the port interface and found it has some CRC errors and others which eating the packets.


Below are the commands which helps towards the troubleshooting process. Use as required.

Collecting tcp dumps.

tcpdump -i vlan0 -s 0 -w /ifs/data/Isilon_Support/node2/ssh_login.pcap host 10.10.10.10 
tcpdump -i vlan0 -s 0 -w /ifs/data/Isilon_Support/node2/node2_ldap.pcap host 10.10.10.10
tcpdump -i vlan0 -s 0 -w /ifs/data/Isilon_Support/node2/node2_mapping.pcap &

isi_for_array 'tcpdump -s 0 -i lagg0 -w /ifs/data/Isilon_Support/$(date +%m%d%Y)/`hostname`.$(date +%m%d%Y_%H%M%S).lagg0.cap &'

Verify active connections on Isilon cluster

isi_for_array netstat -an|grep 10.10.10.10
isi_for_array ifconfig |grep 10.10.10.10


Other commands

tail -f /var/log/lsassd.log       Authentication log file
ps aux |grep lsass                  Current running processes
ifcofnig -a
ls -lrth
isi auth ldap list                     List ldap servers configured on Isilon clsuter
isi auth mapping token --user=abce --zone=1    Verify mapping information for LDAP user
isi auth mapping token --user=abcd
isi auth mapping flush                                         Flush the cache
isi auth mapping flush --all                                 Flush the cache
isi_for_array -n3,4,5 isi auth mapping token --user=abcd
isi_for_array -n3 isi auth mapping token --user=abcd

ldapsearch -H ldap://10.10.10.10 -b 'ou=enterprise,o=abc,c=us' 
ldapsearch -H ldap://10.10.10.10 -x -b "" -s base objectclass="*"  supportedControl
ldapsearch -H 10.10.10.10 -x -b "" -s base objectclass="*"  supportedControl
ldapsearch -H 10.10.10.10 -x -b ,ou=enterprise,o=abc,c=xyz
ldapsearch -H 10.10.10.10 -x -b ou=enterprise,o=abc,c=xyz
ldapsearch -H ldap://10.10.10.10 -x -b ou=enterprise,o=abc,c=xyz
ldapsearch -H ldap://10.10.10.10:2389 -x -b ou=enterprise,o=abc,c=xyz
/usr/likewise/bin/lwsm list
ldapsearch -H ldap://10.10.10.10:2389 -x -b ou=enterprise,o=abc,c=xyz
ldapsearch -x LLL -H ldap://10.10.10.10:2389  -b 'ou=enterprise,o=abc,c=xyz' -D abcd

date; isi auth mapping token --user=abcd
ls -l /ifs/data/Isilon_Support/node2/node2_mapping.pcap
ls -lh /ifs/data/Isilon_Support/node2/node2_mapping.pcap

ping -c 1000 10.10.10.10 -W 1
ping -c 1000 -W 1 10.10.10.10 
ping -c 1000 10.10.10.10 
isi services -a
isi_for_array "ps auxww | grep lsass | grep -v grep"

ldapsearch -x -h abc.xyz.com -p 2389 -D "abcd" -W -b "" -s base "objectclass =*"
ldapsearch -x -h abc.xyz.com -p 2389 -D "abcd" -W -b "" -s base "objectclass=*"
isi_for_array "ps auxww | grep lsass | grep -v grep"
isi_for_array "isi_classic auth ads cache flush --all"    Flush the cache
isi_for_array "isi_classic auth mapping flush --all"      Flush the cache
isi_for_array "killall lsassd -9"                                     Kill the lsassd authentication deamon, which whill be automatically restarted by MCP master control process


ldapsearch -h abc.xyz.ldap.com -p 2389 -D "uid=abc,ou=def,ou=enterprise,o=hij,c=abd" -W  -b "ou=enterprise,o=hij,c=abd" 
ldapsearch -h abc.xyz.ldapserver.com -p 2389 -D "uid=abc,ou=def,ou=enterprise,o=hij,c=abd" -W  -b "ou=enterprise,o=hij,c=abd"  
isi auth ldap list -v
 ldapsearch -h abc.xyz.ldapserver.com -p 2389 -D "uid=abc,ou=def,ou=enterprise,o=hij,c=abd" -W  -b "ou=enterprise,o=hij,c=abd"  "(&(objectClass=posixAccount)(uidNumber=1234))"
 ldapsearch -h abc.xyz.com -p 2389 -D "uid=abc,ou=def,ou=enterprise,o=hij,c=abd" -W  -b "ou=enterprise,o=hij,c=abd"  "(&(objectClass=posixAccount)(uidNumber=1234))"

isi auth mapping dump| less
isi auth mapping dump| wc -l
isi auth mapping token --user=abc
isi auth mapping token --uid 1234
less /var/log/messages
tail /var/log/messages

isi auth log-level
isi auth log-level --set=debug                Set the log level to debug
isi auth log-level --set=warning             Set the log level to warning

ping -c 10 abc.xyz.ldap.com
traceroute abc.xyz.ldapserver.com
isi auth status
isi status
isi auth ldap view Primary
less /var/log/lsassd.log
isi auth mapping token abcdef
isi auth users view abcdef
less /var/log/lwiod.log
less /var/log/messages
less /var/log/lsassd.log

cd /etc/openldap
ls
less ldap.conf
less ldap.conf.default
less /ifs/.ifsvar/main_config_changes.log
less /var/log/lsassd.log
isi_for_array -s isi auth ldap.conf
isi auth status
isi auth ldap view --provider-name=Primary | grep "Group Filter:" | grep "User Filter:"
isi auth ldap view --provider-name=Primary 
isi_for_array -s isi auth ldap view --provider-name=Primary | grep "User Filter:"
isi_for_array -s isi auth ldap view --provider-name=Primary | grep "Group Filter:"
isi_for_array -s isi auth ldap view --provider-name=Primary | grep "User Domain:"
isi_for_array -s /usr/likewise/bin/lwsm list 
isi_for_array -s ps awux | grep lw

ifconfig
isi zone zones list
isi zone zones view system
isi_for_array -s isi zone zones view system
isi networks list pools
isi networks list pools -v
exit
isi status
isi networks list pools
isi networks list pools --name=pool1
mkdir /ifs/data/Isilon_Support/$(date +%m%d%Y)
isi_for_array 'tcpdump -s 0 -i lagg0 -w /ifs/data/Isilon_Support/$(date +%m%d%Y)/`hostname`.$(date +%m%d%Y_%H%M%S).lagg0.cap &'














Friday, July 10, 2015

Isilon : Sync IQ scheduler memory leak issue

Current Isilon versions  7.* have a memory leak issue which causes the sync scheduler to run out of it's allocated 512 max memory and go into hung state. This state will stops all jobs from initializing weather incremental or full. Current code doesn't trigger any alerts during this outage until some one manually verify.

To avoid enter into the outage situation follow the below steps:


Isilon has developed a script for monitoring and trigger email alerts once sync scheduler memory utilization reaches certain threshold, so that sync process can be restarted before go into the hung state.

Below are the commands to verify the memory usage manually.



# isi_for_array -s ps awxu | grep isi_migr_sched | grep -v grep |awk '{print $1 $6}'    This command give the current memory usage across all nodes in the cluster

For example, if we want to be notified when memory reaches 470 MB, script is available with EMC support. edit the threshold values to 470 MB from the script.

Once we receive the email,  run the following commands to reset the memory.

isi sync settings modify --service=off
isi sync settings modify --service=on

This command will reset the memory value to around 76 MB


Note: Script from Isilon has to be executed every time the node gets rebooted.


** Permanent fix will be expected to be on Riptide version (8.0) which is expected in Q4








Friday, June 12, 2015

VMAX : Storage provision for boot LUNs

Most of the clients prefer to create boot luns as Thick.  Below provide steps to create LUNs from thick storage (Disk groups).

1) Verify Disk Group information

symdisk list -dskgrp_summary

Verify available space on disk group.

2) Create Thick LUN

There are 2 ways to create LUNs. We can use existing LUN which already in use as reference to create new LUNs with same confirguration

symconfigure -sid 123 -cmd "configure 2 devices copying dev 1234 overriding config=vdev;" preview/prepare/commit -nop -v

1234 is existing LUN. I am copying same configuration and creating 2 new LUNs

OR

symconfigure  -sid 123 -cmd "create dev count=6, size=2322 cal, emulation=FBA, data_member_count=3, config=RAID-5, disk_group=1;" commit -nop 


3) Create Storage group and add dev to group

symaccess -sid 123 create -name Boot_LUN -type stor -dev 1234;

4) Create Port group

symaccess -sid 845 create -name Boot_LUN -type port -dirport 3E:0, 13E:0, 4E:0, 14E:0

5) Create Initiator group

Creating Child and Parent Storage groups allows the user to use same child group nested under multiple parent groups

     a) Create Child group and set flags

               Create group and add host wwn's to child group
    
          symaccess -sid 123 create -name IG_Child -type init -wwn    20000012345678  ;
          symaccess -sid 123 -name IG_Child -type init add -wwn   200000123456787

            Adding flags C,SPC2  & consistent lun

           symaccess -sid 123 -name IG_Child -type init set ig_flags on C,SPC2 -enable ;
           symaccess -sid 123 -name IG_Child -type init set  consistent_lun on ;


   B)  Create Parent and set flags


            symaccess -sid 123 create -name IG_Parent -type init ;

            Enable C, SPC2

            symaccess -sid 123 -name IG_Parent -type init set ig_flags on C,SPC2 -enable ;

            Enable Consistent LUN

            symaccess -sid 845 -name IG_Parent -type init set consistent_lun on ;


     C) Add child IG groups to Parent IG Groups

            symaccess -sid 845 add -name IG_Parent -type init -ig IG_Child;


6)  Create Masking View

       Create Masking view  symaccess -sid 123 create view -name Boot_LUN  -sg Boot_LUN -pg Boot_LUN -ig IG_Parent -lun 0 ;










Isilon - Clear CE log database

Some times log files gets filled up which avoids Isilon cluster from sending alerts to either emails or call homes.  Performing the following commands will free up the logs database and start sends the alerts again

There is one more case where we need to reset the CE log database. Some times quieting the old alerts throws the error saying "event database not accessible"  or  while doing pre-health check during code upgrades the output shows "Health check returns with warning message saying event database is not accessible".  This can be resolved by clearing and restarting the CE log services and databases.

You can run all commands at once or One at a time if want to.

Create <SR number> directory under Isilon_support directory to store logs for further analysis or troubleshooting


mkdir -p /ifs/.ifsvar/db/celog /ifs/data/Isilon_Support/sandbox /ifs/data/Isilon_Support/celog_backups ;
mkdir /ifs/data/Isilon_Support/<SR Number> ;
isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR Number>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_celog_monitor.core $(pgrep isi_celog_monitor)' ;
isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR Number>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_celog_coalescer.core $(pgrep isi_celog_coalescer)' ;
isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR Number>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_celog_notification.core $(pgrep isi_celog_notifi)' ;sleep 120; 
isi services -a celog_coalescer disable ;
isi services -a celog_monitor disable ;
isi services -a celog_notification disable ;
isi_for_array -sX 'pkill isi_celog_';
mv -vf /ifs/.ifsvar/db/celog/* /ifs/data/Isilon_Support/celog_backups/ ;
isi_for_array -sX 'rm -f /var/db/celog/*.db' ;
isi_for_array -sX 'rm -f /var/db/celog_master/*.db' ;
rm -f /ifs/.ifsvar/db/celog/*.db ;
isi services -a celog_coalescer enable ;
isi services -a celog_monitor enable ;
isi services -a celog_notification enable ;
          isi services -a | grep celog

Sunday, June 7, 2015

Isilon - Useful links

Below links are just for reference. Credits to the authors

Isilon integration with Avamar : https://splitbrained.wordpress.com/2014/02/19/isilon-avamar-ndmp/

Create Multi access zones on Isilon: https://storagenerd.wordpress.com/2013/02/01/how-to-setup-access-zones-for-multiple-active-directory-domains-isilon-7/

Multi access zone video demonstration: https://www.youtube.com/watch?v=hF3W8o-n-Oo
                                                                 https://www.youtube.com/watch?v=R6XRJSp3mj4



Saturday, June 6, 2015

Isilon - Measuring cluster latency

CPU:  isi statistics system --nodes -top
NET : isi statistics protocol --top --orderby=timeavg
           ping/iperf
DISK: isi statistics drive -nall -top --long
MEMORY: isi statistics query -nall -stats=node.memory.used

Isilon: measuring IOPS for drive


Recommended max IOPS rates for Isilon drives are
100 for Sata drives
200 for SAS drives
Stech Mach8 SSD drives:2600
Hitachi SSD drive: 4800

measuring iops per drive
requires root access
isi statistics query --stats=node.disk.xfers.rate.<drive #>
isi statistics query --nodes=all --stats=node.disk.xfers.rate.sum   for all nodes

Measuring latency
isi statistics -nall --top --long

Disk latency
7200 RPM SATA = 8-10 ms
10000 RPM SAS = 3 ms

Infiniband latency: ~.050 milliseconds


Measuring CPU performance under load
isi statistics protocl
isi statistics protocol --orderby-TimeAvg --top
isi statistics system --top  to see greater details about cpu processing
Too see load averages
sysctl vm.loadavg
sysctl vm.uptime

To display performance information for all nodes by drive:
isi statistics drive --nodes=all --orderby=timeinqueue

isi statistics client --remote-name=<IP_address>


Isilon: File locking

File locking

byte range locking: nfsv4, lock individual bytes of a file
Oplocks: SMB opportunistic locks, caching lot of locks at SMB
NLM : network lock manager for older nfs versions nfs3/2

isi statistics heat
nfsstat -c client side of nfs statistics like error count, locks
nfsstat -s server side of nfs statistics





Isilon - Reports

Monitoring:

Live data sources:

isi statistics  is point in time look
SNMP data - tells whether cluster is out of normal situation, cache, memory
cluster events - like  drive failures , boot failures

Exporting Historic data

isi_stats_d
InsightIQ history
CELOG database:
Logs
efs.gmp.group

isi statistics system
                    pstat
                    client
                    protocl
                    query

isi statistics system --nodes --top  teslls most active node

cluster wide and protocol data  isi statistics pstat
isi statistics pstat --protocol=smb2 --top

nlm: network lock manager  for nfs version 2 and 3
lsass_in and lsass_out   authentication

isi statistics client displays most active clients accessing cluster for each protocol types
isi statistics client --orderby=ops --top

isi statistics protocol --totalby=proto,class,op,node --top


BDP: Bandwidth delay product calculates how much bandwidth available on the network. Calculating bandwidth  (RFC 1323)
64KB/RTT (round trip time)  = speed of NIC HW
ex  64kB/0.5 milliseconds) = 125 MBPS

Network  performance measurements tools

iperf: measures TCP and UDP streaming throughput between endpoints
ping - measures RTT, latency on network wire
wireshark/Tshark  allows comprehensive examination of numerious protocols, live/offline capture analysis
tcpdump: provides ability to view packets using predefined fileters, or by capturing everything for post capture filtering (wireshark or tshark)

Iperf usage:
Client: iperft -c <iperf server IP> -t 20
Server: iperf -s


workload I/O operations
Name space operations: mkdir,getattr,readdir,lookup
to see stats for reads,writes, namespace reads, and namespace writes:
isi statistics protocol --classes=read,write,namespace_read,nameshpace_write
how many operations are queued up for each disk
isi_for_array -s sysctl hw.iosched |grep total_inqueue








Friday, June 5, 2015

Isilon - Log analysis

CELOG coalescer Log file raw data
Each node has a set of logs
cluster wide log files    eg: lsas, snapshot, dedupe
/var is unique and individual for every nodes
/var/log either 500 MB or 2 GB depending on node version
if /var/log partition reaches 95 percent full, node gets rebooted every 30 seconds

Log file locations
/var/log on each node
ls /var/log
find /var/log -name "*celog*.log" -print

Log collection isi_gather_info
Logs from specific node isi_gather_info -n <node #>
isi_gather_info -f /var/crash -s 'isi_hw_status -i'


From GUI

Clustermanagement - > diagnostics -> gather

Generic logs:  eg: /var/log/messages
process specific logs eg;  /var/log/lsassd.log  any kind of authentication goes through lsassd.log
/var/log/isi_celog_coalescer.log

Log gather structure:
Isilon-1
Isilon-2
Isilon-3
local  logs that are generic
base level files  like any specific switches used


Isilon-1
varlog.tar
isi_hangdump.tar


isi_gather_info -noupload
isi_gather_info --noupload --group fs --nologs   Log Group --group fs example


Commands for log file filtering
ls
less
grep
cat

common useful options

ls -l
grep -v <expression> file
less -d <file>
cat -n <files>
cd ; ls
ls | less
ls > files.txt
ls>>files.txt

The grep utility
grep -v Diskless /tmp/isistats.txt |grep SSD

ls -l
wc -l
tail, head and grep
grep and cut
sort and uniq

ls -l isi_job_history
wc -l isi_job_history

Narrow scope
tail isi_job_history |head -l
grep ^03 isi_job_history | wc -l


Extract suspected relevant data
grep  ^03 isi_job_history |cut -d\ f4 | cut -d\[ -f1 |sort | uniq -c
find . -name <filename> -print
grep -i error local/messages  |grep -iv cleared | cut -d: -f2- |less






Isilon - System commands

sysctl -d
sysctl -d efs.gmp.group
sysctl -d efs.lbm.drive_space  per-drive space statistics
sysctl -d

view log messages

less /var/log/messages
tail -f /var/log/messages
tail -n 50 /var/log/messages |grep group

isi device more
isi device -h
isi device --action smartfail -d 4
isi devices --action stopfail -d 4

Isilon - Sysctl commands

sysctl commands are equivalent of registry in windows environment. Changes Unix kernel runtime parameters. sysctl commands changes one node's kernel at a time. It doesn't survive reboot. changes will be lost after reboot.

sysctl [option] <name> [=value]
sysctl -a  list all sys controls
sysctl -d  descritpion of sysctl
sysctl -Na
sysctl <sysctl_name>=value    eg:  sysctl kern.nswbuf=2048
set cluster wide  isi_for_array -s sysctl <sysctl_name>=value


Temporary vs Persistent sysctl settings
Temporary sysctl changes are not applied to sysctl.conf configuration file
Persistent sysctl changes can be done by updating sysctl.conf configuration file on each node
isi_sysctl_cluster command updates each node and makes persistent

Setting persistent sysctl procedure

Back up conf file
touch /etc/mcp/override/sysctl.conf && cp /etc/mcp/override/sysctl.conf /etc/mcp/override/sysctl.conf.bkul
isi_sysctl_cluster sysctl_name=value
verify change  cat /etc/mcp/override/sysctl.conf

Monday, June 1, 2015

Cisco - Interface commands

show log nvram
show lacp inter e8/15
show cdp neigh int eth1/27

sh hardware internal erros mod 8 | diff
sh policy-map interface

PDU's
7K are dropping pdu's


Zero Space Reclaim - VMAX


Zero space reclaim is non impactful process which allows to reuse space which left after deleting files. This process runs on VMAX  initiates from  ESXi host.

Procedure:
1. Identify the cluster and host that you want to run the reclaim on.
2. Create alert suppression for the host CI for the reclaim window.
3. Enable SSH and disable lockdown mode for that host by logging into the respective Vcenter.
4. Get the list of data stores on that host by running following command.
esxcli storage filesystem list |grep VMFS-5 |awk '{print $2}'
5. Initiate the Reclaim process on a particular datastore by running following command.
esxcli storage vmfs unmap -l host_name -n 1200
6. Wait for the prompt to come back which means the reclaim process is complete.
7. Once the reclaim process on the datastore is complete, you can initiate the reclaim on a new datastore(Step 5).
8. Once the reclaim process on the host is complete, disable SSH and enable lockdown mode for that host from Vcenter.
 
Note: Do not initiate reclaim on more than 3 datastores at a time to avoid any performance impact.


Tuesday, May 26, 2015

Isilon - Set up email notifications for quota violation


Send email alert upon Isilon quota violation

isi quota quotas notifications create /ifs/TEST directory hard exceeded --action-email-address abc@example.com --holdoff 5m --email-template /ifs/home/admin/notification-templates/quota --action-email-owner yes
 
# isi quota quotas notifications view /ifs/TEST --type directory hard exceeded
  Threshold: hard
  Condition: exceeded
  Schedule: -
  Holdoff: 5m
  ActionAlert: Yes
  EmailOwner: Yes
NotifyAddress: abc@example.com
Email Template: /ifs/home/admin/notification-templates/quota
 
# isi quota quotas notifications list /ifs/TEST --type directory
Threshold Condition
---------------------
hard exceeded
advisory exceeded
---------------------
Total: 2
 

Thursday, February 5, 2015

Isilon - Replication issue

isi_classic sync job report -v policy_name
isi_classic sync pol run policy_name
isi_services -a
isi_mcp start
ps auxwww |grep migr

If scheduler job got stale Follow below process

cp /ifs/.ifsvar/modules/tsm/config/source_record.xml /ifs/.ifsvar/modules/tsm/config/source_record.xml.bak

vi /ifs/.ifsvar/modules/tsm/config/souce_record.xml

Change to <pending-job> No Job Action </pending-job>
                 <current-owner> No Job Owner </current-owner>

egrep 'pending-job |current-owner' /ifs/.ifsvar/modules/tsm/config/source_record.xml | less

Isilon - Verify services running on Isilon

isi services -a

Nexus 7K switch Troubleshooting - Port failure

CLI command history

Show cli history 50
show logging last 50
dir log:

show interface ethernet 8/15
show interface ethernet 8/15 brief
show port-channel summary interface po326
show diagnostic result module 8
show hardware internal errors module 8 | diff
show running-config int eth 8/15
show running-config int po326 membership
show module internal event-history module 8
show vlc consistency-parameters global
show run int po 326 men
show int status module 8
show port-channel database interface port-channel 326
show accounting log




Thursday, January 15, 2015

VAMX - Create similar devices from Disk group

Verify disk group information:
symdisk list -dskgrp_summary

symconfigure -sid 123 -cmd "configure 10 devices copying dev 1512 overriding disk_group=1;" preview -v -nop

Wednesday, January 14, 2015

VMAX - Auto Provisioning

1) List existing thin devices
    symdev list -sid 123 -all -tdev -cyl

2) Create thin devices
    symconfigure -sid 123 -cmd "create dev count=8, size=1100,config=tdev,emulation=FBA;" prepare
    symconfigure -sid 123 -cmd "create dev count=8, size=1100,config=tdev,emulation=FBA;"commit

3) Verify newly created thin devices
     symdev list -sid 123 -tdev -unbound

4) Create data devices
    symconfigure -sid 123 -cmd "create dev count=8,size=1100,config=2-way-mir,  disk=group=3,emmulation=FBA,attribute=datadev

     symdisk list -dskgrp_summary
     symdev list -sid 123 -datadev

5) Crate thin pools
    symconfigure sid 123 -cmd "create pool p, -type=thin;" prepare/commit

     symcfg -sid 123 list -pool -thin
     symcfg -sid 123 show -pool p1 -type thin
     symcfg -sid 123 show -pool p1 -type thin -detail

6) Add data devices to pool
    symconfigure -sid 123 -cmd "add dev 255:256 to pool p1,type=thin,member_state=enable;" commit

7) Bind tdev to thin pool
    symconfigure -sid 123 -cmd "bind tdev 24D:250 to pool p1;" prepare/commit

     symcfg list -sid 123 -tdev -range 24D:250
     symdev -sid 123 list -noport

8)  Pre-allocate space on Tdev
     symconfigure -sid 123 -cmd "start allocate on tdev 24D:250 start_cyl=0,size=100MB;" prepare/commit

9) Create Meta when provisioning data is larger in space
    symconfigure -sid 123 -cmd "form meta from dev 107, config=concatenated/striped; add dev 108 to meta 107;"

    auto meta feature is disabled by default

10) Create Storage group
      symaccess -sid 123 crate -name SG -type storage

       Add devices to storage group
       symaccess -sid 123 -name SG -type storage add devs CD:F4

11) Create Port group
      symaccess -sid 123 create -name PG -type port
    
       Add FA ports to port group
       symaccess -sid 123 -name PG -type port -dirport 7F:1 add

12) Create Initiator group
       symaccess -sid 123 create -name IG -type initiator
      
      Add wwns to initiator group
      symaccess -sid 123 -name IG -type initiator -wwn 5000 add


13) Create Masking view
       symaccess -sid 123 create view -name MV  -sg SG -pg PG -ig IG











VMAX - Basic Commands

1) Verify free space on Vmax
     symconfigure list -freespace

2) To make device not ready
     symdev not-ready <symdev>

Submission of Configuration changes in 3 different ways

1) Using -CMD
    symconfigure -sid 123 -cmd "delete dev 1234;" prepare
2)  Using  -f
    symconfigure -sid 123 -file myfile commit
    myfile.txt  -  delete dev 1234;

3) Using <<EOF
    symconfigure -sid 123 -noprompt prepare
    <<EOF
         dissolve meta dev 002;
         EOF


Query to see status after command executed

  symconfigure -sid 123 query
 
  Abort the running process
  symconfigure -sid 123 abort -session-id <sessionid>


 Setting Enviornmental variables

symcli
set symcli-cli-access-PAEALLEL
set symcli-wait-on-db=1

symcli -def
symcfg discover
symcfg list
 

 Search Tdev Information

1) symdev list -cyl -sid 495 | find "TDEV"

Free space

symconfigure list -free -sid 123

Disk group information

symdisk list -sid 123 -dskgrp-summary

symconfigure verify -sid 123

symlmf query -type emclm -sid 123







Wednesday, January 7, 2015

Isilon - SNMP node information sending traps

On Isilon cluster, Lowest logical number node always sent snmp traps to SNMP server. Hardware related incidents are sent from individual node.

To find which node is acting as primary and sending traps

cat /var/log/isi_celog_coalescer.log |grep “CELOG master is” |tail -1

isi_nodes %{node}...%{internal_a}...%{internal_b} |grep <ip_address_from_last_command>



Monday, January 5, 2015

Isilon - Create Replication session policies Continuous

isi sync policies create --name=abc --action=sync --target-host=isilon.test.xxx.com --target-path=/ifs/isilon/xxx --source-root-path=/ifs/isilon/beta/xxx --description=xxx --schedule=When-Source-Modified