Dynamic Reconfiguration on Sun F15K
With the Sun Fire 15K it is possible to add and remove boards to and from domains without needing to shut down the OS running on that domain. This process is known as Dynamic Reconfiguration (DR). What follows is an overview of DR on the F15K.
During the unconfigure operation on system boards with permanent memory ,(i.e. memory that can't be paged out, e.g. kernel or PROM memory), the OS has to be briefly paused while this memory is transferred to another board (this process is known as quiescence). A safe device is a device that doesn't require access to memory or interupts the system while the OS is in quiescence. An unsafe device may do this. Unsafe devices are listed in /platform/SUNW,Sun-Fire-15000/ kernel/drv/drv.conf, listed against the unsupported-io-drivers statement. DR reads this list and if it finds an active driver that matches a device listed, the operation aborts. To allow DR to work in these circumstances you must:
* kill any processes using the device * Unload the driver ( use modunload ) * Disconnect the cables
An attachement point is a way of representing a board in the F15K.These can be represented as a physical address or a logical address. A physical address is in the format:
* /devices/pseudo/dr@0:SBx(for system boards) * /devices/pseudo/dr@0:IOx(for an I/O board)
The logical representation would be SBx and IOx (where x is the expander (also known as a slot) number, 0-17 in the case of a F15K) Command cfgadm -l will show the attachment points
The F15K has a couple of internal networks. The I1 network is the System Controller (SC) to domain network. This is used for controlling the domains from the SC. The I2 network is the SC to SC network and is used to keep the two system controllers in sync so that either one can become the main (controlling) SC.
There are four DR operations, connect, configure, unconfigure and disconnect. When a board is added to a domain it is connected then configured. If it is removed it is unconfigured then disconnected. The cfgadm command performs these opertaions.
DR on I/O boards
Before I/O boards can be added or removed, all devices must be closed and any filesystems unmounted , etc.
DR on memory
Before removing a board, non-permanent memory is flushed to swap and permanent memory copied to another board. (NOTE: On systems running Sun Cluster 3, boards with permanent memory allocated to them cannot take part in DR operations). To check for permanent memory use:
* cfgadm -a -l -v | grep permanent - shows where the permanent memory is located.
====* DR from the Domain ====*
Removing a CPU/Memory Board
1. Use cfgadm -lto check the attachment point 2. Use pbind to unbind processes from the CPUs on the board (see below for details) 3. cfgadm -v -c disconnect SBx to remove the board from the domain
pbind -q queries bound processes, pbind -b processor_id pid binds a process to a processor, pbind -u pid unbinds a process so it can use any processor.
Removing an I/O Board
1. Check the board status with cfgadm -a -s "select=class(sbd)" 2. Stop I/O activity to the board e.g. umount filesystems on disks 3. Detach any unsafe devices using modunload 4. cfgadm -v -c disconnect IOx
(You can also unconfigure a board which means that it is still part of the domain but the OS can't use it)
Adding a board
cfgadm -v -c configure SBx (or IOx)
(You can also connect a board which makes it part of the domain but the OS can't use it) ====* From the System Controller ====*
* showplatform - shows configured domains and what status they are in * showboards - shows status of boards and what domain they are connected to * showdevices SBx - shows the devices that are on the specified board * addboard - add a board to a domain e.g. addboard -d A -r 2 -t 600 SB1 adds system board 1 to domain A , if the attach fails, 2 retries are performed with a wait of 600 seconds between retries. * deleteboard - removes a board from a domain, e.g. deleteboard -r 2 -t 900 SB2 * moveboard - detaches a board from one domain and attaches it to another, e.g. moveboard -d B -r 2 -t 300 SB5 * rcfgadm - remote cfgadm, performs the same functions as cfgadm in the same way but from the system controller, e.g. rcfgadm -d C -l * scdrhelp - a java GUI that can be used to look up error messages
1. showplatform - displays the domain ID, e.g. A, B,etc 2. showboards -d A - shows the boards connected to the specified domain 3. showdevices -v -d A - show the devices assigned to the domain 4. Use addboard/deleteboard/moveboard - as required
Replacing a System Board
1. deleteboard SB2 2. poweroff SB2 3. swap the board 4. poweron SB2 5. addboard -d mydomain SB2