Friday, June 12, 2015

VMAX : Storage provision for boot LUNs

Most of the clients prefer to create boot luns as Thick.  Below provide steps to create LUNs from thick storage (Disk groups).

1) Verify Disk Group information

symdisk list -dskgrp_summary

Verify available space on disk group.

2) Create Thick LUN

There are 2 ways to create LUNs. We can use existing LUN which already in use as reference to create new LUNs with same confirguration

symconfigure -sid 123 -cmd "configure 2 devices copying dev 1234 overriding config=vdev;" preview/prepare/commit -nop -v

1234 is existing LUN. I am copying same configuration and creating 2 new LUNs

OR

symconfigure  -sid 123 -cmd "create dev count=6, size=2322 cal, emulation=FBA, data_member_count=3, config=RAID-5, disk_group=1;" commit -nop 


3) Create Storage group and add dev to group

symaccess -sid 123 create -name Boot_LUN -type stor -dev 1234;

4) Create Port group

symaccess -sid 845 create -name Boot_LUN -type port -dirport 3E:0, 13E:0, 4E:0, 14E:0

5) Create Initiator group

Creating Child and Parent Storage groups allows the user to use same child group nested under multiple parent groups

     a) Create Child group and set flags

               Create group and add host wwn's to child group
    
          symaccess -sid 123 create -name IG_Child -type init -wwn    20000012345678  ;
          symaccess -sid 123 -name IG_Child -type init add -wwn   200000123456787

            Adding flags C,SPC2  & consistent lun

           symaccess -sid 123 -name IG_Child -type init set ig_flags on C,SPC2 -enable ;
           symaccess -sid 123 -name IG_Child -type init set  consistent_lun on ;


   B)  Create Parent and set flags


            symaccess -sid 123 create -name IG_Parent -type init ;

            Enable C, SPC2

            symaccess -sid 123 -name IG_Parent -type init set ig_flags on C,SPC2 -enable ;

            Enable Consistent LUN

            symaccess -sid 845 -name IG_Parent -type init set consistent_lun on ;


     C) Add child IG groups to Parent IG Groups

            symaccess -sid 845 add -name IG_Parent -type init -ig IG_Child;


6)  Create Masking View

       Create Masking view  symaccess -sid 123 create view -name Boot_LUN  -sg Boot_LUN -pg Boot_LUN -ig IG_Parent -lun 0 ;










Isilon - Clear CE log database

Some times log files gets filled up which avoids Isilon cluster from sending alerts to either emails or call homes.  Performing the following commands will free up the logs database and start sends the alerts again

There is one more case where we need to reset the CE log database. Some times quieting the old alerts throws the error saying "event database not accessible"  or  while doing pre-health check during code upgrades the output shows "Health check returns with warning message saying event database is not accessible".  This can be resolved by clearing and restarting the CE log services and databases.

You can run all commands at once or One at a time if want to.

Create <SR number> directory under Isilon_support directory to store logs for further analysis or troubleshooting


mkdir -p /ifs/.ifsvar/db/celog /ifs/data/Isilon_Support/sandbox /ifs/data/Isilon_Support/celog_backups ;
mkdir /ifs/data/Isilon_Support/<SR Number> ;
isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR Number>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_celog_monitor.core $(pgrep isi_celog_monitor)' ;
isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR Number>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_celog_coalescer.core $(pgrep isi_celog_coalescer)' ;
isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR Number>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_celog_notification.core $(pgrep isi_celog_notifi)' ;sleep 120; 
isi services -a celog_coalescer disable ;
isi services -a celog_monitor disable ;
isi services -a celog_notification disable ;
isi_for_array -sX 'pkill isi_celog_';
mv -vf /ifs/.ifsvar/db/celog/* /ifs/data/Isilon_Support/celog_backups/ ;
isi_for_array -sX 'rm -f /var/db/celog/*.db' ;
isi_for_array -sX 'rm -f /var/db/celog_master/*.db' ;
rm -f /ifs/.ifsvar/db/celog/*.db ;
isi services -a celog_coalescer enable ;
isi services -a celog_monitor enable ;
isi services -a celog_notification enable ;
          isi services -a | grep celog

Sunday, June 7, 2015

Isilon - Useful links

Below links are just for reference. Credits to the authors

Isilon integration with Avamar : https://splitbrained.wordpress.com/2014/02/19/isilon-avamar-ndmp/

Create Multi access zones on Isilon: https://storagenerd.wordpress.com/2013/02/01/how-to-setup-access-zones-for-multiple-active-directory-domains-isilon-7/

Multi access zone video demonstration: https://www.youtube.com/watch?v=hF3W8o-n-Oo
                                                                 https://www.youtube.com/watch?v=R6XRJSp3mj4



Saturday, June 6, 2015

Isilon - Measuring cluster latency

CPU:  isi statistics system --nodes -top
NET : isi statistics protocol --top --orderby=timeavg
           ping/iperf
DISK: isi statistics drive -nall -top --long
MEMORY: isi statistics query -nall -stats=node.memory.used

Isilon: measuring IOPS for drive


Recommended max IOPS rates for Isilon drives are
100 for Sata drives
200 for SAS drives
Stech Mach8 SSD drives:2600
Hitachi SSD drive: 4800

measuring iops per drive
requires root access
isi statistics query --stats=node.disk.xfers.rate.<drive #>
isi statistics query --nodes=all --stats=node.disk.xfers.rate.sum   for all nodes

Measuring latency
isi statistics -nall --top --long

Disk latency
7200 RPM SATA = 8-10 ms
10000 RPM SAS = 3 ms

Infiniband latency: ~.050 milliseconds


Measuring CPU performance under load
isi statistics protocl
isi statistics protocol --orderby-TimeAvg --top
isi statistics system --top  to see greater details about cpu processing
Too see load averages
sysctl vm.loadavg
sysctl vm.uptime

To display performance information for all nodes by drive:
isi statistics drive --nodes=all --orderby=timeinqueue

isi statistics client --remote-name=<IP_address>


Isilon: File locking

File locking

byte range locking: nfsv4, lock individual bytes of a file
Oplocks: SMB opportunistic locks, caching lot of locks at SMB
NLM : network lock manager for older nfs versions nfs3/2

isi statistics heat
nfsstat -c client side of nfs statistics like error count, locks
nfsstat -s server side of nfs statistics





Isilon - Reports

Monitoring:

Live data sources:

isi statistics  is point in time look
SNMP data - tells whether cluster is out of normal situation, cache, memory
cluster events - like  drive failures , boot failures

Exporting Historic data

isi_stats_d
InsightIQ history
CELOG database:
Logs
efs.gmp.group

isi statistics system
                    pstat
                    client
                    protocl
                    query

isi statistics system --nodes --top  teslls most active node

cluster wide and protocol data  isi statistics pstat
isi statistics pstat --protocol=smb2 --top

nlm: network lock manager  for nfs version 2 and 3
lsass_in and lsass_out   authentication

isi statistics client displays most active clients accessing cluster for each protocol types
isi statistics client --orderby=ops --top

isi statistics protocol --totalby=proto,class,op,node --top


BDP: Bandwidth delay product calculates how much bandwidth available on the network. Calculating bandwidth  (RFC 1323)
64KB/RTT (round trip time)  = speed of NIC HW
ex  64kB/0.5 milliseconds) = 125 MBPS

Network  performance measurements tools

iperf: measures TCP and UDP streaming throughput between endpoints
ping - measures RTT, latency on network wire
wireshark/Tshark  allows comprehensive examination of numerious protocols, live/offline capture analysis
tcpdump: provides ability to view packets using predefined fileters, or by capturing everything for post capture filtering (wireshark or tshark)

Iperf usage:
Client: iperft -c <iperf server IP> -t 20
Server: iperf -s


workload I/O operations
Name space operations: mkdir,getattr,readdir,lookup
to see stats for reads,writes, namespace reads, and namespace writes:
isi statistics protocol --classes=read,write,namespace_read,nameshpace_write
how many operations are queued up for each disk
isi_for_array -s sysctl hw.iosched |grep total_inqueue








Friday, June 5, 2015

Isilon - Log analysis

CELOG coalescer Log file raw data
Each node has a set of logs
cluster wide log files    eg: lsas, snapshot, dedupe
/var is unique and individual for every nodes
/var/log either 500 MB or 2 GB depending on node version
if /var/log partition reaches 95 percent full, node gets rebooted every 30 seconds

Log file locations
/var/log on each node
ls /var/log
find /var/log -name "*celog*.log" -print

Log collection isi_gather_info
Logs from specific node isi_gather_info -n <node #>
isi_gather_info -f /var/crash -s 'isi_hw_status -i'


From GUI

Clustermanagement - > diagnostics -> gather

Generic logs:  eg: /var/log/messages
process specific logs eg;  /var/log/lsassd.log  any kind of authentication goes through lsassd.log
/var/log/isi_celog_coalescer.log

Log gather structure:
Isilon-1
Isilon-2
Isilon-3
local  logs that are generic
base level files  like any specific switches used


Isilon-1
varlog.tar
isi_hangdump.tar


isi_gather_info -noupload
isi_gather_info --noupload --group fs --nologs   Log Group --group fs example


Commands for log file filtering
ls
less
grep
cat

common useful options

ls -l
grep -v <expression> file
less -d <file>
cat -n <files>
cd ; ls
ls | less
ls > files.txt
ls>>files.txt

The grep utility
grep -v Diskless /tmp/isistats.txt |grep SSD

ls -l
wc -l
tail, head and grep
grep and cut
sort and uniq

ls -l isi_job_history
wc -l isi_job_history

Narrow scope
tail isi_job_history |head -l
grep ^03 isi_job_history | wc -l


Extract suspected relevant data
grep  ^03 isi_job_history |cut -d\ f4 | cut -d\[ -f1 |sort | uniq -c
find . -name <filename> -print
grep -i error local/messages  |grep -iv cleared | cut -d: -f2- |less






Isilon - System commands

sysctl -d
sysctl -d efs.gmp.group
sysctl -d efs.lbm.drive_space  per-drive space statistics
sysctl -d

view log messages

less /var/log/messages
tail -f /var/log/messages
tail -n 50 /var/log/messages |grep group

isi device more
isi device -h
isi device --action smartfail -d 4
isi devices --action stopfail -d 4

Isilon - Sysctl commands

sysctl commands are equivalent of registry in windows environment. Changes Unix kernel runtime parameters. sysctl commands changes one node's kernel at a time. It doesn't survive reboot. changes will be lost after reboot.

sysctl [option] <name> [=value]
sysctl -a  list all sys controls
sysctl -d  descritpion of sysctl
sysctl -Na
sysctl <sysctl_name>=value    eg:  sysctl kern.nswbuf=2048
set cluster wide  isi_for_array -s sysctl <sysctl_name>=value


Temporary vs Persistent sysctl settings
Temporary sysctl changes are not applied to sysctl.conf configuration file
Persistent sysctl changes can be done by updating sysctl.conf configuration file on each node
isi_sysctl_cluster command updates each node and makes persistent

Setting persistent sysctl procedure

Back up conf file
touch /etc/mcp/override/sysctl.conf && cp /etc/mcp/override/sysctl.conf /etc/mcp/override/sysctl.conf.bkul
isi_sysctl_cluster sysctl_name=value
verify change  cat /etc/mcp/override/sysctl.conf

Monday, June 1, 2015

Cisco - Interface commands

show log nvram
show lacp inter e8/15
show cdp neigh int eth1/27

sh hardware internal erros mod 8 | diff
sh policy-map interface

PDU's
7K are dropping pdu's


Zero Space Reclaim - VMAX


Zero space reclaim is non impactful process which allows to reuse space which left after deleting files. This process runs on VMAX  initiates from  ESXi host.

Procedure:
1. Identify the cluster and host that you want to run the reclaim on.
2. Create alert suppression for the host CI for the reclaim window.
3. Enable SSH and disable lockdown mode for that host by logging into the respective Vcenter.
4. Get the list of data stores on that host by running following command.
esxcli storage filesystem list |grep VMFS-5 |awk '{print $2}'
5. Initiate the Reclaim process on a particular datastore by running following command.
esxcli storage vmfs unmap -l host_name -n 1200
6. Wait for the prompt to come back which means the reclaim process is complete.
7. Once the reclaim process on the datastore is complete, you can initiate the reclaim on a new datastore(Step 5).
8. Once the reclaim process on the host is complete, disable SSH and enable lockdown mode for that host from Vcenter.
 
Note: Do not initiate reclaim on more than 3 datastores at a time to avoid any performance impact.