San Addict: June 2015

Friday, June 12, 2015

VMAX : Storage provision for boot LUNs

Most of the clients prefer to create boot luns as Thick. Below provide steps to create LUNs from thick storage (Disk groups).

1) Verify Disk Group information

symdisk list -dskgrp_summary

Verify available space on disk group.

2) Create Thick LUN

There are 2 ways to create LUNs. We can use existing LUN which already in use as reference to create new LUNs with same confirguration

symconfigure -sid 123 -cmd "configure 2 devices copying dev 1234 overriding config=vdev;" preview/prepare/commit -nop -v

1234 is existing LUN. I am copying same configuration and creating 2 new LUNs

symconfigure -sid 123 -cmd "create dev count=6, size=2322 cal, emulation=FBA, data_member_count=3, config=RAID-5, disk_group=1;" commit -nop

3) Create Storage group and add dev to group

symaccess -sid 123 create -name Boot_LUN -type stor -dev 1234;

4) Create Port group

symaccess -sid 845 create -name Boot_LUN -type port -dirport 3E:0, 13E:0, 4E:0, 14E:0

5) Create Initiator group

Creating Child and Parent Storage groups allows the user to use same child group nested under multiple parent groups

a) Create Child group and set flags

Create group and add host wwn's to child group

symaccess -sid 123 create -name IG_Child -type init -wwn 20000012345678 ;

symaccess -sid 123 -name IG_Child -type init add -wwn 200000123456787

Adding flags C,SPC2 & consistent lun

symaccess -sid 123 -name IG_Child -type init set ig_flags on C,SPC2 -enable ;

symaccess -sid 123 -name IG_Child -type init set consistent_lun on ;

B) Create Parent and set flags

symaccess -sid 123 create -name IG_Parent -type init ;

Enable C, SPC2

symaccess -sid 123 -name IG_Parent -type init set ig_flags on C,SPC2 -enable ;

Enable Consistent LUN

symaccess -sid 845 -name IG_Parent -type init set consistent_lun on ;

C) Add child IG groups to Parent IG Groups

symaccess -sid 845 add -name IG_Parent -type init -ig IG_Child;

6) Create Masking View

Create Masking view symaccess -sid 123 create view -name Boot_LUN -sg Boot_LUN -pg Boot_LUN -ig IG_Parent -lun 0 ;

Isilon - Clear CE log database

Some times log files gets filled up which avoids Isilon cluster from sending alerts to either emails or call homes. Performing the following commands will free up the logs database and start sends the alerts again

There is one more case where we need to reset the CE log database. Some times quieting the old alerts throws the error saying "event database not accessible" or while doing pre-health check during code upgrades the output shows "Health check returns with warning message saying event database is not accessible". This can be resolved by clearing and restarting the CE log services and databases.

You can run all commands at once or One at a time if want to.

Create <SR number> directory under Isilon_support directory to store logs for further analysis or troubleshooting

mkdir -p /ifs/.ifsvar/db/celog /ifs/data/Isilon_Support/sandbox /ifs/data/Isilon_Support/celog_backups ;

mkdir /ifs/data/Isilon_Support/<SR Number> ;

isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR Number>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_celog_monitor.core $(pgrep isi_celog_monitor)' ;

isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR Number>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_celog_coalescer.core $(pgrep isi_celog_coalescer)' ;

isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR Number>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_celog_notification.core $(pgrep isi_celog_notifi)' ;sleep 120;

isi services -a celog_coalescer disable ;

isi services -a celog_monitor disable ;

isi services -a celog_notification disable ;

isi_for_array -sX 'pkill isi_celog_';

mv -vf /ifs/.ifsvar/db/celog/* /ifs/data/Isilon_Support/celog_backups/ ;

isi_for_array -sX 'rm -f /var/db/celog/*.db' ;

isi_for_array -sX 'rm -f /var/db/celog_master/*.db' ;

rm -f /ifs/.ifsvar/db/celog/*.db ;

isi services -a celog_coalescer enable ;

isi services -a celog_monitor enable ;

isi services -a celog_notification enable ;

isi services -a | grep celog

Sunday, June 7, 2015

Isilon - Useful links

Below links are just for reference. Credits to the authors

Isilon integration with Avamar : https://splitbrained.wordpress.com/2014/02/19/isilon-avamar-ndmp/

Create Multi access zones on Isilon: https://storagenerd.wordpress.com/2013/02/01/how-to-setup-access-zones-for-multiple-active-directory-domains-isilon-7/

Multi access zone video demonstration: https://www.youtube.com/watch?v=hF3W8o-n-Oo
https://www.youtube.com/watch?v=R6XRJSp3mj4

Saturday, June 6, 2015

Isilon - Measuring cluster latency

CPU: isi statistics system --nodes -top
NET : isi statistics protocol --top --orderby=timeavg
ping/iperf
DISK: isi statistics drive -nall -top --long
MEMORY: isi statistics query -nall -stats=node.memory.used

Isilon: measuring IOPS for drive

Recommended max IOPS rates for Isilon drives are
100 for Sata drives
200 for SAS drives
Stech Mach8 SSD drives:2600
Hitachi SSD drive: 4800

measuring iops per drive
requires root access
isi statistics query --stats=node.disk.xfers.rate.<drive #>
isi statistics query --nodes=all --stats=node.disk.xfers.rate.sum for all nodes

Measuring latency
isi statistics -nall --top --long

Disk latency
7200 RPM SATA = 8-10 ms
10000 RPM SAS = 3 ms

Infiniband latency: ~.050 milliseconds

Measuring CPU performance under load
isi statistics protocl
isi statistics protocol --orderby-TimeAvg --top
isi statistics system --top to see greater details about cpu processing
Too see load averages
sysctl vm.loadavg
sysctl vm.uptime

To display performance information for all nodes by drive:
isi statistics drive --nodes=all --orderby=timeinqueue

isi statistics client --remote-name=<IP_address>

Isilon: File locking

File locking

byte range locking: nfsv4, lock individual bytes of a file
Oplocks: SMB opportunistic locks, caching lot of locks at SMB
NLM : network lock manager for older nfs versions nfs3/2

isi statistics heat
nfsstat -c client side of nfs statistics like error count, locks
nfsstat -s server side of nfs statistics

Isilon - Reports

Monitoring:

Live data sources:

isi statistics is point in time look
SNMP data - tells whether cluster is out of normal situation, cache, memory
cluster events - like drive failures , boot failures

Exporting Historic data

isi_stats_d
InsightIQ history
CELOG database:
Logs
efs.gmp.group

isi statistics system
pstat
client
protocl
query

isi statistics system --nodes --top teslls most active node

cluster wide and protocol data isi statistics pstat
isi statistics pstat --protocol=smb2 --top

nlm: network lock manager for nfs version 2 and 3
lsass_in and lsass_out authentication

isi statistics client displays most active clients accessing cluster for each protocol types
isi statistics client --orderby=ops --top

isi statistics protocol --totalby=proto,class,op,node --top

BDP: Bandwidth delay product calculates how much bandwidth available on the network. Calculating bandwidth (RFC 1323)
64KB/RTT (round trip time) = speed of NIC HW
ex 64kB/0.5 milliseconds) = 125 MBPS

Network performance measurements tools

iperf: measures TCP and UDP streaming throughput between endpoints
ping - measures RTT, latency on network wire
wireshark/Tshark allows comprehensive examination of numerious protocols, live/offline capture analysis
tcpdump: provides ability to view packets using predefined fileters, or by capturing everything for post capture filtering (wireshark or tshark)

Iperf usage:
Client: iperft -c <iperf server IP> -t 20
Server: iperf -s

workload I/O operations
Name space operations: mkdir,getattr,readdir,lookup
to see stats for reads,writes, namespace reads, and namespace writes:
isi statistics protocol --classes=read,write,namespace_read,nameshpace_write
how many operations are queued up for each disk
isi_for_array -s sysctl hw.iosched |grep total_inqueue

Friday, June 5, 2015

Isilon - Log analysis

CELOG coalescer Log file raw data
Each node has a set of logs
cluster wide log files eg: lsas, snapshot, dedupe
/var is unique and individual for every nodes
/var/log either 500 MB or 2 GB depending on node version
if /var/log partition reaches 95 percent full, node gets rebooted every 30 seconds

Log file locations
/var/log on each node
ls /var/log
find /var/log -name "*celog*.log" -print

Log collection isi_gather_info
Logs from specific node isi_gather_info -n <node #>
isi_gather_info -f /var/crash -s 'isi_hw_status -i'

From GUI

Clustermanagement - > diagnostics -> gather

Generic logs: eg: /var/log/messages
process specific logs eg; /var/log/lsassd.log any kind of authentication goes through lsassd.log
/var/log/isi_celog_coalescer.log

Log gather structure:
Isilon-1
Isilon-2
Isilon-3
local logs that are generic
base level files like any specific switches used

Isilon-1
varlog.tar
isi_hangdump.tar

isi_gather_info -noupload
isi_gather_info --noupload --group fs --nologs Log Group --group fs example

Commands for log file filtering
ls
less
grep
cat

common useful options

ls -l
grep -v <expression> file
less -d <file>
cat -n <files>
cd ; ls
ls | less
ls > files.txt
ls>>files.txt

The grep utility
grep -v Diskless /tmp/isistats.txt |grep SSD

ls -l
wc -l
tail, head and grep
grep and cut
sort and uniq

ls -l isi_job_history
wc -l isi_job_history

Narrow scope
tail isi_job_history |head -l
grep ^03 isi_job_history | wc -l

Extract suspected relevant data
grep ^03 isi_job_history |cut -d\ f4 | cut -d\[ -f1 |sort | uniq -c
find . -name <filename> -print
grep -i error local/messages |grep -iv cleared | cut -d: -f2- |less

Isilon - System commands

sysctl -d
sysctl -d efs.gmp.group
sysctl -d efs.lbm.drive_space per-drive space statistics
sysctl -d

view log messages

less /var/log/messages
tail -f /var/log/messages
tail -n 50 /var/log/messages |grep group

isi device more
isi device -h
isi device --action smartfail -d 4
isi devices --action stopfail -d 4

Isilon - Sysctl commands

sysctl commands are equivalent of registry in windows environment. Changes Unix kernel runtime parameters. sysctl commands changes one node's kernel at a time. It doesn't survive reboot. changes will be lost after reboot.

sysctl [option] <name> [=value]
sysctl -a list all sys controls
sysctl -d descritpion of sysctl
sysctl -Na
sysctl <sysctl_name>=value eg: sysctl kern.nswbuf=2048
set cluster wide isi_for_array -s sysctl <sysctl_name>=value

Temporary vs Persistent sysctl settings
Temporary sysctl changes are not applied to sysctl.conf configuration file
Persistent sysctl changes can be done by updating sysctl.conf configuration file on each node
isi_sysctl_cluster command updates each node and makes persistent

Setting persistent sysctl procedure

Back up conf file
touch /etc/mcp/override/sysctl.conf && cp /etc/mcp/override/sysctl.conf /etc/mcp/override/sysctl.conf.bkul
isi_sysctl_cluster sysctl_name=value
verify change cat /etc/mcp/override/sysctl.conf

Monday, June 1, 2015

Cisco - Interface commands

show log nvram
show lacp inter e8/15
show cdp neigh int eth1/27

sh hardware internal erros mod 8 | diff
sh policy-map interface

PDU's
7K are dropping pdu's

Zero Space Reclaim - VMAX

Zero space reclaim is non impactful process which allows to reuse space which left after deleting files. This process runs on VMAX initiates from ESXi host.

Procedure:

1. Identify the cluster and host that you want to run the reclaim on.

2. Create alert suppression for the host CI for the reclaim window.

3. Enable SSH and disable lockdown mode for that host by logging into the respective Vcenter.

4. Get the list of data stores on that host by running following command.

esxcli storage filesystem list |grep VMFS-5 |awk '{print $2}'

5. Initiate the Reclaim process on a particular datastore by running following command.

esxcli storage vmfs unmap -l host_name -n 1200

6. Wait for the prompt to come back which means the reclaim process is complete.

7. Once the reclaim process on the datastore is complete, you can initiate the reclaim on a new datastore(Step 5).

8. Once the reclaim process on the host is complete, disable SSH and enable lockdown mode for that host from Vcenter.

Note: Do not initiate reclaim on more than 3 datastores at a time to avoid any performance impact.