2014-04-18

slp_srvreg issue

透過TSM進行AIX OS[註1]備份時, 會出現一筆有關slp_srvreg無法寫入的警告訊息, 解決的方法大致上有三種:
第一種:fix這個apar
第二種:關閉slp_srvreg的服務
第三種:TSM進行備份時, 排除該檔案的備份

[註1]AIX OS版本:7.1 TL1 SP5

以下是選擇第二種方法進行處理, 步驟如下:
step1: 寫一支小script, vi /etc/rc.local
###############################################
#!/bin/sh

/usr/bin/ps -ef | grep -v grep | grep slp_srvreg > /dev/null

if [ $? -eq 0 ]; then
   /usr/sbin/slp_srvreg -k
fi

exit 0
###############################################

step2: chmod 700 /etc/rc.local

step3: 在/etc/inittab新增一行 rc.local:2:once:/etc/rc.local > /dev/console 2>&1

2014-03-24

動態調整LPAR資源

動態調整LPAR資源時要注意CPU和Memory的Assign數值不能超過Maximum的數值。


另外若LPAR的RMC連線有問題時, 進行動態調整資源時需要重新啟用LPAR方能生效;問題的檢查方式如下:

As root on the HMC or IVM server, you can execute the rmcdomainstatus command as follows:

# /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc

You should get a list of all the partitions that the HMC or IVM server can reach on the public network on port 657. The output should look like this:

Management Domain Status: Managed Nodes
  O a  0xc8bc2c9647c1cef3  0003  9.2.5.241
  I a  0x96586cb4b5fc641c  0002  9.2.5.33
  S S  0x5c88fb81dad9f609  0001  9.2.5.65

If you run the rmcdomainstatus command on a Managed Node (i.e. a partition), a list similar to the following should be displayed.

Management Domain Status: Management Control Points
   I A  0xef889c809d9617c7 0001  9.57.24.139

Each line of output represents the status of a cluster node, relative to the node upon which the command is executed.

I. The first token of the node status line is either S, I, i, O, X, or Z.

S

Indicates the line is the status of a peer node itself (when run on IVM, this will indicate the IVM partition itself.
I

Indicates that the partition is “Up" as determined by the RMC heartbeat mechanism (i.e. an active RMC connection exists).
i

Indicates that the partition is Pending Up. Communication has been established, but the initial handshake between two RMC daemons has not been completed. If this indicator is present upon successive executions of the rmcdomainstatus command, then message authentication is most likely failing. Authentication problems will occur when the MCP or partition identity do not match each other’s trusted host list. To list the current identity or identities for the IVM server or HMC and the logical partition run the following command on both:
/usr/sbin/rsct/bin/ctsvhbal
To list the trusted host list on the MCP or partition run :
/usr/sbin/rsct/bin/ctsthl -l
On the IVM or HMC, there is an entry for the partition. On the partition, there is an entry for the IVM or HMC. The HOST_IDENTITY value must match one of the identities listed in the respective ctsvhbal command output.
O

Indicates that the RMC connection is “Down", as determined by the RMC heartbeat mechanism. The partition is either not active or it may also indicate that the RMC daemon on the specified node is not connecting properly. Ensure that the partition can communicate with the HMC or IVM server on the public network and that port 657 is not being blocked on the public network firewall.
X

Indicates that a communication problem has been discovered and the RMC daemon has suspended communications with the RMC daemon that is on the specified node. This is typically the result of a configuration problem in the network, such that small heartbeat packets can be exchanged between the RMC daemon and the RMC daemon that is on the specified node, but larger data packets cannot. This is usually the result of a difference in MTU sizes in the network adapters of the nodes.
Z

Indicates that the RMC daemon has suspended communications with the RMC daemon that is on the specified node because the up/down state of the node is changing too rapidly. This is typically the result of more than one node having the same node ID. (See part III. for instructions on correcting.)
II. The second token of the node status line is either S, A, R, a, or r.

S

Indicates the line is the status of a peer node itself (when run on IVM, this will indicate the IVM partition itself.
A

Indicates that there are no messages queued to the specified node.
R

Indicates that messages are queued to the specified node. This may be caused by a network that is operating under a heavy load or possible a full /var filesystem.
a

Has the same meaning as A, but the specified node is executing a version of the RMC daemon that is at a lower code level than the local RMC daemon.
r

Has the same meaning as R, but the specified node is executing a version of the RMC daemon that is at a lower code level than the local RMC daemon.
III. The third token of the status line is the ID of the specified node.

The node ID is a 64-bit number that is created when RSCT is installed. It is derived using a True Random Number Generator and is used to uniquely identify a node to the RMC subsystem. The node ID is maintained in the /var/ct/cfg/ct_node_id file. A backup copy is maintained in the /etc/ct_node_id file. If this value is not unique among all systems where RSCT is installed and managed by the same HMC or IVM server, you can generate a new cluster identifier with the following command.
/usr/sbin/rsct/install/bin/recfgct
Note: This command will affect any cluster software that uses RSCT such as CSM or GPFS.
IV.The fourth token of the status line is an internal node number that is used by the RMC daemon.

V. If the list is a list of Peer Nodes or Managed Nodes, the fifth token is the name of the node as known to the RMC subsystem.

On power5 and power6 partitions, the RMC connection will be IP based so this will list the IP address of the partition.

2014-03-11

ent0, en0 和 et0的差異

ent0:

The notation ent0 is used to specify the hardware adapter. It has nothing to do with the TCP/IP address. The parameters associated with ent0 can be seen as below:

# lsattr -El ent0

It will show parameters related to card.

It shows adapter_names, alt_addr, auto_recovery, backup_adapter, hash_mode, mode, netaddr, noloss_failover, num_retries, retry_time, use_alt_addr, use_jumbo_frame.

------------------------------------------------------------------------------------------------------------

en0:

en0 represents the interface associated with hardware adapter ent0. The notation en0 is used for Standard Ethernet(inet). The TCP/IP address is associated with this interface.

The parameters associated with en0 can be seen as below:

#lsattr -El en0

It shows all the parameters related with the interface en0 of the adapter ent0.

It shows alias4, alias6, arp, authority, broadcast=1500, mtu, netaddr, netaddr6, netmask, prefixlen, remmtu, rfc1323, security, state, tcp_mssdflt, tcp_nodelay, tcp_recvspace, tcp_sendspace.

Rest everything is same except mtu(Maximum Transfer Unit) value. Which is 1500 as per the standard ethernet protocol.

------------------------------------------------------------------------------------------------------------

et0:

et0 represents the interface associated with hardware adapter ent0. The notation et0 is used for IEEE 802.3 Ethernet(inet). If you are using standard ethernet protocol then it will not have TCP/IP address.

The parameters associated with et0 can be seen as below:

#lsattr -El et0

It shows all the parameters related with the interface et0 of the adapter ent0.

It shows alias4, alias6, arp, authority, broadcast, mtu=1492, netaddr, netaddr6, netmask, prefixlen, remmtu, rfc1323, security, state, tcp_mssdflt, tcp_nodelay, tcp_recvspace, tcp_sendspace.

Note here as well that the MTU shown will be 1492 as per IEEE 802.3 standard. Rest all parameters will be same. Also, netaddr, netmask fields will be empty fr et0.

2014-03-05

EXP510, DS4800, SAN, Switch, Power720接線圖和開、關機順序


1. 開機順序

          SAN Switch or Switch -> EXP510 -> DS4800(註1) -> HMC -> Power720

          註1:查詢DS4800的IP Address

1. 接妥console線, 查詢通用序列匯流排號碼
2. 開啟putty連結時透過 Ctrl + Break 按鍵進行正確字元的切換, 當出現正確的字元時, 再按 Esc 按鍵
3. login/password: shellUsr / w3oo&w4
4. 輸入 netCfgSet




1. 關機順序

   LPAR -> Power720 -> HMC -> DS4800 -> EXP510 -> SAN Switch or Switch


接線圖

vCenter Site Recovery Manager

  • 以一個site為單位, 每一個site一定要有一個vCenter和一個SRM
  • VR
    • 是透過網路的方式進行檔案複寫(file-level), 但其實傳輸的時候也是用block-level的方式進行傳送, RPO(採用非同步)=15min~24hr
    • 非同步的方式進行傳輸
    • 關於state的部分, 若guest os是windows os(Windows 2003版本以上), vmware tools會呼叫VSS的機制儲存, linux則無
    • plan migration會先將guest os先shutdown, 確保state可以寫回硬碟.
    • 只會拷貝開機狀態的Guest OS的檔案.
    • 若Protected端有做snapshot時, 複製到recovery端的時候會被合併
    • 被管理的IP要指向vCenter Server
    • 第一次會用完全的複寫方式到Recovery端, 複寫的過程不會做壓縮也不會做加密, 之後的複寫都是用ESXi內建的CBT(Change Block Tracking)的方式做複寫.
  • Array-based
    • 會透過LUN(block-level)的方式進行複寫
    • 同步/非同步的方式傳輸(遠端的方式可能無法用同步的方式)
    • SRA的版本最好是要一樣, 且Storage也最好是相同的廠牌
  • SRM5.1(只負責控制)到另一個site是可以透過Recovery plan(這個是可以自行定義的)自行定義啟動的優先順序
    • 不需要設定為Linked mode
    • SRM API是採用SOAP的方式, 也就是用XML語法的方式作為與其他平台做資料交換
    • 可以安裝在虛擬機器上
    • 需安裝在Windows 64bit的作業系統上
    • 若是走Storage Array的方式, 儲存設備廠商必須提供SRA(storage replication adapters)
    • support DB2、SQL Server、Oracle
    • 沒有support Storage vMotion和Storage DRS
  • SRM5.5新功能
    • 支援Storage vMotion
    • 可保留VM的前24個複寫狀態
    • 支援Flash cache(SSD)
  • Vmotion Latency need <= 5ms 的網路品質, 來源端和目的端的Port group需一樣
  • Metro Vmotion Latency need <= 10ms 的網路品質
  • placeholder:virtual machine的快照和設定
  • VMkernel無法對應, 只能對應Port group
  • Network Port group若更改名稱的話, 必須重新建立
  • Protection Group:一群需要從Protection site端切換到Recovery site端的Guest OS
    • 建立完成之後會放在未來的Recovery Plan(自動化災難移轉)去做應用
  • Recovery plan
    • 是建立在Recovery Site的
    • 還原時可以被定義是連接哪一個網路, 以避免與正式環境起衝突; 可透過一個dr-ip-customizer.exe小工具進行大量修改IP
    • 可以修改開機順序, Power on之後馬上就會接下一個群組, 無法設定開機後delay多久才開啟第二個群組, 只能設定dependencies
    • 每個Guest OS要確保都已安裝VMware Tools
    • 只要Guest OS是Microsoft OS(windows2000, 2003, xp)的環境, 建議安裝User Profile Hive Cleanup Services以加速關機作業

2014-01-24

DB2 user space 調整

Administrator
ITM@Admin


至DB2 GUI控制中心 -> 右鍵要增加容量的DB
管理儲存體 -> 新增自動儲存體 -> 增加位置 
db2 connect to database_name
db2 alter database add storage on 'd:\'
db2 connect reset
db2 connect to database_name 
db2 ALTER TABLESPACE tablespace_name REBALANCE

2014-01-12

ITM CEC Base agent issue

ITM CEC Base agent重啟後,ITM無法顯示CEC的數據,解決步驟如下:

STEPS TO RECYCLE THE SHARED MEMORY:
1.
Stop all the processes which use the SPMI shared library
(xmservd,filtd, xmperf, 3dmon, ptxrlog, harmd, topas, any PSSP process)
if they are currently running your AIX LPAR. 


2.
Stop the ITM system P agents (PX, PK, PH) that you have running on
this system. 


3.
To find if any zombie processes are running on your machine.
a. Run
# ps -ef | grep Provider
Kill each one of them "kill <pid> 


b. Run
ps -ef | grep kpkagent
ps -ef | grep kpxagent
ps -ef | grep kphagent
Kill each one of them "kill <pid>" 


c.
Check if there are any defunct processes
"ps -ef | grep defunct" 


4. Kill the processes that are using the shared memory. 
genld -l | grep -p Spmi 

5.
Run "ipcs -m" command and check for any segment "KEY" that begins
with '0x78'
, as listed below:
T ID KEY
m 0 0xc76283cc
m 1 0x78002323
If there are any such segments starting with 0x78, make sure the process
which uses those shared segments is stopped (or terminate it with kill
command) 


6. Then run: "ipcrm -m <ID #>" to
clear the shared memory segments

7.
Run "slibclean"

8. Restart system P agents - PH PX PK