Version 1.03.00 - 16 August 2006 ================================ ccsd * don't grab random config from network, require initial cluster.conf * Fix inifinite loop causing hangs in other daemons. bz#194361 cman * Allow zero votes. cman-kernel * Don't try to delete AUTODELETE barriers in timer context as we can't get the semaphore that protects the structures. bz#177577 * If quit_threads gets set via an incoming message, don't try to carry on. bz#164535 * Don't try to print local node stuff is 'us' is NULL. bz#189605 * Clear comms sequence number for a node when it leaves the cluster. Otherwise we ignore messages when it tries to join again and causes cluster mayhem. bz#187777 * If we get a Master-HELLO and we are not the master for this transition then kick off a new one to resolve the ambiguity. bz#194491 dlm-kernel * Don't try to unlock a lock if there's no LKB. bz#188525 * We need to allocated space for 5 ints, rather than 4 when sending a query reply. bz#173811 * add printk's for the error conditions so we have some idea what happened before a gfs panic. * Kernel Oops when passing LKF_CANCEL to dlm_ls_unlock_wait. bz#201325 fenced * If there are no devices defined within a node's method, that method should be considered failed. bz#190661 gfs_fsck * improve logging. bz#156009 * fix to repair damaged and corrupt resource groups and resource group index entries that previously caused gfs_fsck to abort. bz#179069 gfs-kernel * allow cman nodeid to be used with CDPN. bz#198381 * When releasing a glock with GL_NOCACHE flag set, care was not taken to ensure that only one holder for the glock remained. This was corrupting the glock and preventing further access to the glock. FLOCKS use this GL_NOCACHE flag. bz#191222 * F_GETLK was broken, always used to return zero conflicts for local plocks. Also bogus pid was being returned for local locks. Added a new pid field to gfs posix lock to store and return actual pid. bz#198303 gnbd * Make gnbd work with device mapper multipath * Fix gnbd_monitor so that it will correctly restart multiple devices per server. gulm * Retry if initial connections attempts fail. bz#183507 rgmanager * Work around for bz#193128 * Enable self-watchdog support (adds a second clurgmgrd process), bz#193247 * Apply patch to fix bz#193128 from Navid Sheikhol-Eslami * Allow failover if owner dies while stopping a service. bz#193255 * Fix various clustat related usability & performance problems. bz#185952, bz#175010, bz#182454, bz#190234, bz#190408, bz#192999 * Port clumanager's 'lock' operation to rgmanager bz#175010 * Various internal performance improvements for large numbers of services. * Implement crude NFS lock reclaim broadcast / reclaim notifications. * Mark services with autostart=0 as 'disabled' instead of 'stopped'. * Add patch from Josef Whiter to implement no-failback option for a given FO domain. bz#189841 Version 1.02.00 - 10 April 2006 =============================== dlm-kernel: Allow DLM to start if the node gets a different nodeid. dlm-kernel: Add WARNING printk when cman calls emergency_shutdown. dlm-kernel: The in_recovery semaphore wasn't being released in corner case where grant message is ignored for lock being unlocked. dlm-kernel: Remove an assertion that triggers unnecessarily in rare cases of overlapping and invalid master lookups. dlm-kernel: Don't close existing connection if a double-connect is attempted - just ignore the last one. dlm-kernel: Fix a race where an attempt to unlock a lock in the completion AST routine could crash on SMP. dlm-kernel: Fix transient hangs that could be caused by incorrect handling of locks granted due to ALTMODE. bz#178738 dlm-kernel: Allow any old user to create the default lockspace. You need Udev running AND build dlm with ./configure --have_udev. dlm-kernel: Only release a lockspace if all users have closed it. bz#177934 cman-kernel: Fix cman master confusion during recovery. bz#158592 cman-kernel: Add printk to assert failure when a nodeid lookup fails. cman-kernel: Give an interface "max-retries" attempts to get fixed after an error before we give up and shut down the cluster. cman-kernel: IPv6 FF1x:: multicast addresses don't work. Always send out of the locally bound address. bz#166752 cman-kernel: Ignore really badly delayed old duplicates that might get sent via a bonded interface. bz#173621 cman-kernel: /proc/cluster/services seq_start needs to initialise the pointer, we may not be starting from the beginning every time. bz#175372 cman-kernel: Fix memory leak when reading from /proc/cluster/nodes or /proc/cluster/services. bz#178367 cman-kernel: Send a userspace notification when we are the last node in a cluster. bz#182233 cman-kernel: add quorum device interface for userspace cman-kernel: Add node ID to /proc/cluster/status cman: Allow "cman_tool leave force" to cause cman to leave the cluster even if it's in transition or joining. cman: Look over more than 16 interfaces when searching for the broadcast address. cman: init script does 'cman_tool leave remove' on stop cman: add cman_get/set_private to libcman cman: add quorum device API to libcman gfs-kernel: Fix performance with sync mount option; pages were not being flushed when gfs_writepage is called. bz#173147 gfs-kernel: Flush pages into storage in case of DirectIO falling back to BufferIO. DirectIO reads were sometimes getting stale data. gfs-kernel: Make sendfile work with stuffed inodes; after a write on stuffed inode, mark cached page as not uptodate. bz#142849 gfs-kernel: Fix spot where the quota_enforce setting is ignored. gfs-kernel: Fix case of big allocation slowdown. The allocator could end up failing its passive attempts to lock all recent rgrps because another node had deallocated from them and was caching the locks. The allocator now switches from passive to forceful requests after try_threshold failures. gfs-kernel: Fix rare case of bad NFS file handles leading to stale file handle errors. bz#178469 gfs-kernel: Properly handle error return code from verify_jhead(). gfs-kernel: Fix possible umount panic due to the ordering of log flushes and log shutdown. bz#164331, bz#178469 gfs-kernel: Fix directory delete out of memory error. bz#182057 gfs-kernel: Return code was not being propagated while setting default ACLs causing an EPERM everytime. bz#182066 gulm: Fix bug that would cause luck_gulmd to not call waitpid unless SIGCHLD was received from the child. bz#171246 gulm: Fix problems with host lookups. Now try to match the ip if we are unable to match the name of a lock server as well as fixing the expiration of locks if gulm somehow gets a FQDN. bz#169171 fence/fenced: Multiple devices in one method were not being translated into multiple calls to an agent, but all the device data was lumped together for one agent call. bz#172401 fence/fence_apc: Make agent work with 7900 series apc switches. bz#172441 fence/fence_ipmilan: fixes for bz#178314 fence/fence_drac: support for drac 4/I fence/fence_drac: interface change in drac_mc firmware version 1.2 fence: Add support for IBM rsa fence agent gnbd-kernel: gnbd_monitor wouldn't correctly reset after an uncached gnbd had failed and been restored. bz#155304 gnbd-kernel: kill gnbd_monitor when all uncached gnbds have been removed. bz#127042 gnbd: changes to let multipath run over gnbd. gfs_fsck: Fix small window where another node can mount during a gfs_fsck. bz#169087 gfs_fsck: gfs_fsck crashed on many types of extended attribute corruptions. bz#173697 gfs_fsck: Check result code and handle failure's in fsck rgrp read code. bz#169340 gfs_fsck: fix errors checking large (multi-TB) filesystems. bz#186125 gfs_edit: new version with more options that uses ncurses. ccs: Make ccs connection descriptors time out, fixing a problem where all descriptors could be used up, even though none are in use. ccs: Increase number of connection descriptors from 10 to 30. ccs: Ignore SIGPIPE, don't catch SIGSEV, allowing for core dumps. ccs: endian fixes for clusters of machines with different endianness ccs: Fix error printing. bz#178812 ccs: fix ccs_tool seg fault on upgrade. bz#186121 magma-plugins/sm: Fix reads of /proc/cluster/services. bz#175033 magma-plugins/gulm: Fix clu_lock() return value that resulted in "Resource temporarily unavailable" messages at times. bz#171253 rgmanager: Add support for inheritance in the form "type%attribute" instead of just attribute so as to avoid confusion. rgmanager: Fix bz#150346 - Clustat usability problems rgmanager: Fix bz#170859 - VIPs show up on multiple members. rgmanager: Fix bz#171034 - Missing: Monitoring for local and cluster fs's rgmanager: Fix bz#171036 - RFE: Log messages in resource agents rgmanager: Fix bz#165447 - ip.sh fails when using VLAN on bonded interface rgmanager: Fix bz#171153 - clustat withholds information if run on multiple members simultaneously rgmanager: Fix bz#171236 - ia64 alignment warnings rgmanager: Fix bz#173526 - Samba Resource Agent rgmanager: Fix bz#173916 - rgmanager log level change requires restart rgmanager: Fix bz#174819 - clustat crashes if ccsd is not running rgmanager: Fix bz#175106 - lsof -b blocks when using gethostbyname causing slow force-unmount when DNS is broken rgmanager: Fix bz#175108 - rgmanager storing extraneous info using VF rgmanager: Fix bz#175114 - rgmanager uses wrong stop-order for unspecified resource agents rgmanager: Implement bz#175215: Inherit fsid for nfs exports rgmanager: Fix bz#175229 - remove unneeded references to clurmtabd; it is no longer a necessary piece for NFS failover rgmanager: Fix bz#176343 - __builtin_return_address(x) for x>0 is never guaranteed to work rgmanager: Ensure rgmanager doesn't block SIGSEGV when debug is not enabled. rgmanager: Fix bz#172177, bz#172178 rgmanager: Allow scripts to inherit the name attr of a parent in case the script wants to know it. bz#172310 rgmanager: Fix #166109 - random segfault in clurgmgrd rgmanager: Fix most of 177467 - clustat hang Version 1.01.00 - 5 October 2005 ================================ cman-kernel: SM should wait for all recoveries to complete before it processes any group joins/leaves. bz#162014 cman-kernel: Fix barriers. cman-kernel: Fix off-by-one error in find_node_by_nodeid() that can cause an oops in some odd circumstances. dlm-kernel: Don't increment the DLM reference count when connecting to an already extant lockspace. bz#157295 dlm-kernel: Fix refcounting that could cause a memory leak. dlm-kernel: Return locking errors correctly. bz#154990 dlm-kernel: Don't free the lockinfo block if the LKB still exists. bz#161146 cman: "cman_tool join" can now set /proc/cluster/conf/cman values from CCS lock_dlm: The first mounter shouldn't let others mount until others_may_mount() has been called. bz#161808 gfs-kernel: If it took too long to sync the dependent inodes back to disk, resource group descriptor could get corrupted. bz#164324 gfs-kernel: It is now possible to toggle acls on and off with -o remount. Also, acls are only displayed when they are enabled. gfs-kernel: No longer check permissions before truncating a file in gfs_setattr. bz#169039 gfs-kernel: Fix oops when copying suid root file to gfs. gfs-kernel: changes to work on 2.6.13 gfs_fsck: Some variables weren't getting initialized properly in pass1b, causing hangs (or segfaults) when duplicate blocks were present. bz#162709 fence: Add support for Dell PowerEdge 1855 to fence_drac. bz#150563 fence: Add support for latest ilo firmware version (1.75). Changes were also added to make sure that power status of the machine is being properlly checked after power change commands have been issued. bz#161352 fence: fence_ipmilan default operation should be reboot. bz#164627 fence: fence_wti default operation should be reboot. bz#162805 ccs: Increase daemon performance by adding local socket communications. rgmanager: Fix ip bugs. bz#157327, bz#163651, bz#166526 rgmanager: Fix hang when specifying nonexistent services. bz#159767 rgmanager: Fix service tree handling. bz#162824, bz#162936 Version 1.00.00 - 29 June 2005 ============================== Initial release.