Home -> Nimbus -> TP1.3.3.1 -> Admin Guide / User Guide

Workspace Admin Guide (TP1.3.3.1)

1. Introduction

This guide contains configuration information for system administrators working with Workspaces. It provides references to information on procedures typically performed by system administrators, including installing, configuring, deploying, and testing the installation.

Important

This information is in addition to the basic Globus Toolkit prerequisite, overview, installation, security configuration instructions in the GT 4.0 System Administrator's Guide. Please read through this guide before continuing.

Configuring GT4 with the bare necessities for the Workspace Service requires you to install the Java GT4 core with security tools:

make wsjava common globus_proxy_utils globus_simple_ca \
globus_simple_ca_setup globus_user_env postinstall

If you can obtain certificates from some other location, all that is required is Java GT4 core which is available as a binary tarball from the Globus download page.

Important

If you only need to install the workspace client see the client only section of this manual.

Installing the Workspace Service requires installing two pieces of software, which you can download from the download page:

  • The Workspace Service frontend. This is a GT4 web service that must be installed on a node with GT4 installed on it.

  • The Workspace backend script (or workspace-control). This is a script that must be installed on at least one node running the Xen hypervisor. If you install the script on multiple nodes, the Workspace Service will manage these nodes as a pool of resources.

Note that the Workspace Service frontend and backend can be installed on the same node or on different nodes.

The first sections of this guide will walk you through the installation of both components, and their configuration:

  • Section 2 - Software prerequisites. Describes what software must be installed before you can install the Workspace Service.

  • Section 3 - Installing the Workspace Service. Describes how to install the web service frontend of the Workspace Service.

  • Section 4 - Installing the Workspace Backend script. Describes how to install the workspace-control backend script on the Xen nodes.

  • Section 5 - Configuring. Describes the configuration options you will need to modify to tailor the Workspace Service to function correctly on your site.

After completing these sections, the Workspace Service will be installed and configured. You will be able to test your installation by following the instructions on Section 6. The remaining sections provide more detailed administration information, such as how to perform a client-only installation (Section 7), an overview of how the Workspace Service handles network configuration in the VMs (Section 8), how to integrate with a local site scheduler using the Workspace Pilot (section 9) and some troubleshooting tips (Section 10). If you encounter any problems during the installation of the Workspace Service, please refer first to this troubleshooting section. If you encounter problems not addressed in the Troubleshooting section, feel free to ask questions on our user mailing list: workspace-user@globus.org

If you are upgrading to TP1.3.3.1 from one of the recent releases, section 11 has notes about what configurations were added or changed.

2. Software prerequisites

The Workspace Service frontend currently requires whatever GT4 Java core requires. Namely Java 1.4+ and Ant. It is only currently tested on Linux.

Xen (2.0.7+ or from the 3.x series but not 3.2.x) is required to be installed as the hypervisor. Below, when this guide speaks of something being installed "on the hypervisor node," it means in Xen's dom0. This is for example where workspace-control runs.

The ISC DHCP server (or DHCP server with compatible conf file) and ebtables are required to be running on each hypervisor node. Any recent version of each package should be compatible, the distributed scripts that automate the configurations were tested with ISC DHCP 3.0.3 and ebtables 2.0.6 userspace tools. For more information on why this software is now necessary and how it will not interfere with a site's pre-existing DHCP server, see the network configuration details section.

Since these two pieces of software are relatively common, they may already be present in your hypervisor nodes via the package management system. Check your distribution tools for packages called dhcp (ISC DHCP server) and ebtables. You can also check for the existence of /sbin/ebtables or /usr/sbin/ebtables (for ebtables) and any of the following files for the DHCP server:

  • /etc/dhcp/dhcpd.conf

  • /etc/dhcp3/dhcpd.conf

  • /etc/init.d/dhcpd

  • /etc/init.d/dhcp3-server

If these software packages are not installed, all major Linux distributions include them and you should be able to easily install them with your package management system. For example, "rpm -ihv dhcp-*.rpm", "apt-get install dhcp", "emerge dhcp", etc. And similarly for ebtables.

ebtables requires kernel support in dom0, the default Xen kernel includes this support. If your dom0 kernel does not include these for some reason, the options to enable are under Networking :: Networking options :: Network packet filtering :: Bridge Netfilter Configuration :: Ethernet Bridge Tables

workspace-control also requires sudo and Python 2.3+ on all VMM nodes under the control of the Workspace Service.

3. Installing the Workspace Service frontend

Currently, only the source is distributed for the service. To install the Workspace Service frontend, download the "Grid Service" tar file from the download page and, as the globus user (or whatever user has ownership of the GLOBUS_LOCATION directory), run the following:

export GLOBUS_LOCATION=/path/to/globus-location
cd workspace-service
ant deploy

Ant will print out several status messages, and the installation of the Workspace Service frontend will be complete.

4. Installing the Workspace backend script

The Workspace backend script must be installed on all nodes that will be running VMs under the control of the Workspace Service. These nodes must have Xen installed on them. If you are planning on installing the backend script to multiple nodes so that the service can manage the nodes as a resource pool, you will be able to easily replicate the backend script to several nodes once you have succesfully configured one node. Note that you can also install the script on the same node as the Workspace Service (as long as that node is also running Xen).

The steps required to install the Workspace backend script are the following:

  1. Install the backend script using the backend installer.

  2. Configure sudo so the backend script will be able to run Xen commands.

  3. Configuring the DHCP software, and making sure that the backend script points to the correct paths (DHCP init scripts, etc.)

  4. Configure the Workspace frontend so it will know the location of the backend script

Note that, at the end of this section the Workspace backend will not be completely configured. This section covers the basic configuration steps related to adequately configuring related software (such as sudo and the DHCP server). The next section (Configuration) covers how to configure both the Workspace frontend and backend to meet the needs of your specific site.

4.1. Using the backend installer

Download the "Control Agent" tar file from the download page, and untar it. The workspace-control directory contains multiple files, but we will be interested in only two of them: the install.py script which we will use to install the backend script, and the worksp.conf.example file, a sample Workspace backend configuration file.

First of all, you will need to choose a directory where all the backend files will be installed, and where several subdirectories will be created. These directories are for isolating and mounting the guest images as well as giving the backend script places for persistence, tmp files, and an image cache. The provided configuration file uses /opt/workspace as a default installation directory. If this is your first time installing workspace-control it is probably best if you stick with the default /opt/workspace setup. Otherwise, you will need to edit the configuration file to modify the directory paths.

So, assuming you are using the /opt/workspace installation path, do the following: (assuming you are in the directory where you untar-ed the "Control Agent" tar file.

cd workspace-control
mkdir /opt/workspace
cp worksp.conf.example /opt/workspace/worksp.conf

Next, you need to choose (or create) a user and unix group that will be used to run the backend script (for this guide, let's assume that your user and group are both called globus). Since we do not allow the backend script to be run as root, this user will rely on sudo to run Xen commands and other privileged commands. The installer will check all related permissions for you, create the necessary directories in /opt/workspace, and then adjust their permissions to be correct. In particular, the backend script (workspace-control) and other tools will be placed in directory /opt/workspace/bin, which will be owned by root. For more details on permissions, you can find a detailed note on this topic in the configuration file itself.

Now we are ready to run the installer. The install.py program has three modes of operation you can use to install the backend script:

  -n, --noninteractive  Don't ask the user anything (for automated install with
                        a well-known conf file).

  -o, --onlyverify      Just run the setup tests and print what would have
                        happened in --noninteractive mode.  Good option to try
                        first since it will do nothing to the filesystem.

  -i, --install         Install the program, make needed directories, and set
                        them with default permissions. Will block to ask you
                        questions if necessary.

The --noninteractive mode is a good, fast choice if you want to use the default /opt/workspace hierarcy that comes in the sample configuration file. For example, you could run the installer like this:

python install.py -c /opt/workspace/worksp.conf -a globus -g globus -n

Just before exiting, the installer will print out the sudo policies you need to configure (see the next section).

To see more installation options, you can run install.py with the --help argument.

4.2. sudo

Using the visudo command, add the sudo policies printed out by the installer to the /etc/sudoers file. These policies should look something like this:

globus ALL=(root) NOPASSWD: /opt/workspace/bin/dhcp-config.sh
globus ALL=(root) NOPASSWD: /opt/workspace/bin/mount-alter.sh
globus ALL=(root) NOPASSWD: /usr/sbin/xm
globus ALL=(root) NOPASSWD: /usr/sbin/xend

These policies reflect the user running the GT container (globus) and the correct full paths to the dhcp-config.sh and mount-alter.sh tools, the xm program, and the xend daemon (needed only in the case where the workspace-control program is permitted to reboot the daemon if it has fallen).

4.3. Configuring the DHCP server

DHCP is used here as a delivery mechanism only, these DHCP servers do NOT pick the addresses to use on their own. Their policy files are dynamically altered by workspace-control as needed. Policy additions include the MAC addresses which is used to make sure the requester receives the intended DHCP lease.

Configuring the DHCP server consists of copying the example DHCP file "dhcp.conf.example" (included in the workspace-control directory) to "/etc/dhcp/dhcpd.conf" and editing it to include the proper subnet lines (see the contents of the example file). The subnet lines are necessary to get the DHCP server to listen on the node's network interface. So, make sure that you add a subnet line that matches the subnet of the node's network interface. No lease configurations, available ranges, etc. should be added: these are added dynamically to the file after the token at the bottom.

In most cases it is unecessary, but if you have a non-standard DHCP configuration you may need to look at the "dhcp-config.sh" script in the protected workspace bin directory and look at the "adjust as necessary" section. The assumptions made are as follows:

  • DHCP policy file to adjust: "/etc/dhcp/dhcpd.conf"
  • Stop DHCP server: "/etc/init.d/dhcpd stop"
  • Start DHCP server: "/etc/init.d/dhcpd start"
  • The standard unix utility "dirname" is assumed to be installed. This is used to find the workspace-control utilities "dhcp-conf-alter.py" and "ebtables-config.sh", we assume they are in the same directory as "dhcp-config.sh" itself. Paths to these can alternatively be hardcoded to fit your preferred configuration.

The "foreign-subnet" script may be needed for DHCP support. It allows VMMs to deliver IP information over DHCP to workspaces even if the VMM itself does not have a presence on the target IP's subnet. This is an advanced configuration, you should read through the script's leading comments and make sure to clear up any questions before using. It is particularly useful for hosting workspaces with public IPs where the VMMs themselves do not have public IPs. This is because it does not require a unique interface alias for each VMM (public IPs are often scarce resources).

4.4. Setting the backend paths in the service frontend

To use the backend script to manage VMs, the frontend service needs to know where it is located. On the node where you installed the frontend service, edit file $GLOBUS_LOCATION/etc/workspace_service/jndi-config.xml

Find the backendPath entry under the WorkspaceService section. You will not need to modify the default value if you used the default /opt/workspace path to install the backend script. If not, modify the value to specify the absolute path to the backend script. If you are using a pool of nodes, this path refers to the program's location on the Xen nodes.

To support customization tasks, the service needs to put temporary files on the VMM node. These files are copied into the workspace before boot. This requires a tmp directory on the VMM node and the service needs to know what that is. This defaults to /opt/workspace/tmp.

Find the backendTempDirectory entry under the WorkspaceService section.

5. Configuring

At this point, the Workspace Service frontend and backend are both installed. This section describes the different configuration options available in both the frontend and backend, which will allow you to customize the Workspace Service (e.g., to use a specific pool of nodes, to allocate specific IP addresses to the VMs, etc.)

5.1. Workspace Service configuration

After installing the Workspace Service, files that alter its configuration are stored in the $GLOBUS_LOCATION/etc/workspace_service/ directory. For the Workspace Service to work correctly you will, at least, need to configure the factory grid-mapfile.

  • If you want the Workspace Service to use a pool of Xen nodes... you will need to write a resource pool file specifying what nodes are part of your resource pool.

  • If you want the Workspace Service to allocate IP addresses to the VMs (and optionally configure them in the VMs)... you will need to create a network association file listing the IPs that can be allocated to the VMs. Note that you can create multiple network associations (e.g., one for private IP addresses and another one for public IP addresses).

All these configuration options are described in the following points:

5.1.1. Factory grid-mapfile

To edit the list of authorized users of the factory service, add DNs to $GLOBUS_LOCATION/etc/workspace_service/workspace-grid-mapfile

Important

The factory service grid-mapfile overrides the container wide grid-mapfile (typically this is /etc/grid-security/grid-mapfile). The service is distributed with zero entries, which disables access to workspace creation entirely.

Add a DN to the list in normal Globus fashion (if the DN has spaces in it, use quotes around it). The gridmap authorization module requires a username mapping, but it is irrelevant to this service, so using e.g. "fakeuser" for each DN entry is OK. We support actual grid-mapfiles instead of simple access control lists for maximum compatibility with predeployed installations.

The "groupauthz" plugin allows for basic group policies, you can add identities to groups and then control how much time each member is allowed to expend, etc.

Other (attribute based) authorization methods are available via the plugin system. That page also explains the creation process in detail including rich authorization options.

After a workspace is created, only the creator may manage the WSRF resource. This is configured programmatically for each workspace, so no grid-mapfile for the workspace service is needed, just the factory.

5.1.2. Deploying locally or to a resource pool

There are currently two service implementations, xenlocal and xenSSH. The former is simple to configure, but only allows deploying workspaces on a single node (where both the frontend and backend must be installed; so, the frontend interacts locally with the backend). xenSSH, on the other hand, allows the Workspace Service to manage a pool of nodes (the "resource pool"), with the frontend controlling the backend through SSH. In this implementation, the service keeps track of the what resource are available at the current time, adding and subtracting the resources used, keeping a view of what is available at the different hypervisor nodes. In this version of the service, this only tracks the RAM available for VMs at the node and what networking associations each node can support (they can be different from node to node).

To choose the service implementation, edit the jndi-config.xml file. Find the implemention entry under the WorkspaceService section. The value of this entry determines the service implementation.

If you choose XenSSH, take into account that the user running the GT4 container on the frontend node must have passwordless SSH access (e.g., by using SSH keys) to the nodes that belong to the resource pool, and vice versa.

5.1.3. xenSSH: Resource pool configuration

The $GLOBUS_LOCATION/etc/workspace_service/resourcepools directory can contain files that represent resource pools.

If you want to change this directory, alter the "resourcepoolDirectory" configuration of the "SlotManagementAdapter" section in the jndi-config.xml file.

The pool file format is currently very simple: for each node in the pool, list the hostname and the amount of RAM it can spare for running guest VMs.

Optionally, you can also specify that certain hosts can only support a subset of the available networking associations (see the file comments for syntax).

If you change these configuration files after starting the container, only a fresh container reboot will actually incur the changes.

  • If you add a node, this will be available immediately after the container reboot.
  • If you remove a node that is currently in use, no new deployments will be mapped to this VMM. However, this will not destroy (or migrate) any current VMs running there. If that is necessary it currently needs to be accomplished explicitly.
  • If you change a node that is currently in use, the change will take effect for the next lease.

    If you've removed support for an association on a VMM that the current VM(s) is using, this change will not destroy (or migrate) the VM(s) to adjust to this restriction. If that is necessary it currently needs to be accomplished explicitly.

    If you've reduced the memory allocation below what the current VM(s) on the node is/are currently using, this will not destroy (or migrate) the current VM(s) to adjust to this restriction. If that is necessary it currently needs to be accomplished explicitly. Once the VM(s) are gone, the maximum memory available on that VMM will be the new, lower maximum.

5.1.4. Networking assocations

The workspace service can allocate IP addresses from pools of addresses (called associations) for the Allocate and AllocateAndConfigure networking methods. See this section of the interface descriptions for background information.

These pools are configured by listing available addresses in files. The name of the file is the association name, and the directory to find all such files is, by default, $GLOBUS_LOCATION/etc/workspace_service/associations

If you want to change this directory, alter the "associationDirectory" configuration of the "NetworkAdapter" section in the jndi-config.xml file.

The service is packaged with one sample association file, public, which can be found in the default associations directory. It contains the syntax explanation as comments. Currently there is no script to auto-generate such files from higher level logic, each entry must be specified manually or generated via a quick script of your own.

If you change these configuration files after starting the container, only a fresh container reboot will actually incur the changes.

  • If you add an address or entirely new pool, it will be available immediately after the container reboot.
  • If you remove an address that is currently in use, it will never be leased again from this point forward. However, this will not destroy (or affect in any way) a current workspace with the leased address. If that is necessary it currently needs to be accomplished explicitly.
  • If you change an address that is currently in use, the change will take effect the next time the address is leased.

5.1.5. Factory policies

Currently, the factory can be configured with three simple policies concerning workspace deployment. Each value is listed in the WorkspaceFactoryService section of $GLOBUS_LOCATION/etc/workspace_service/jndi-config.xml

The defaultRunningTimeMinutes value is used if the DeploymentTime element of the deployment request is not specified.

The maxRunningTimeMinutes value is what the DeploymentTime element of the deployment request may not exceed.

The WSRFResourceMinutesPastRunningTime value configures the default termination time of the WSRF resource representing the workspace. When a workspace is shutdown, its representation is not terminated (one reason being that it can be started again). This value is added on to the running time and is currently only configurable at this location.

The maxGroupSize configuration what the NodeNumber element of the deployment request may not exceed. If this is zero, negative, or missing, there will be no limit to group request size.

The optional architecture value can be one of the JSDL ProcessorArchitectureEnumeration values such as "x86". If you are running Xen on other architectures, this should be changed appropriately.

The optional vmm value will currently be Xen.

The optional vmmVersions value can be a comma separated list of one-word strings (deciding on values to use is a deployment issue within VOs).

For more information about the deployment request, see the interfaces documentation.

For more information about richer authorization options, see the plugins documentation for discussion. The service installation includes a "groupauthz" module that is included but not activated by default.

5.1.6. Staging adapters

By default, the workspace service does not have any staging adapters configured. Staging adapters allow clients to send an optional stage-in and/or stage-out request with the workspace creation request, and the workspace service will handle making this happen.

To enable the HTTP adapter, simply uncomment the HTTP configuration section in the jndi-config.xml file. Fill in the configuration for the image node hostname that will be the target of the HTTP transfers. If this is not localhost, the adapter will use SSH and therefore need passwordless key access as in the resource pool model's case.

To enable the RFT adapter, you must install the RFT plugin (available on the workspace downloads page) in addition to uncommenting the RFT adapter configuration section in the Workspace Service's jndi-config.xml file. The RFT service can be hosted in the same container but it is not required: the service URL is always passed along with the staging request.

5.2. workspace-control program configuration

5.2.1. Authorized kernels

This version of the workspace service does not support unauthorized client kernels. As such, the backend configuration file lists the authorized kernels, which must be inside the backend directory tree (by default, in the /opt/workspace/images directory). Copy any kernels you wish to use to that directory, and list them in the guestkernels option under the [images] section of the configuration file. By doing this, clients can choose from these kernels in the metadata, but they must already exist at the hypervisor node and must be in the guestkernels list.

5.2.2. Networking

With Xen, for each NIC on the hypervisor node, it is typical to create a corresponding bridge such that guest virtual machines may be bridged to each NIC. This allows workspaces to be on different networks or allows spreading out of the networking load (in the case where the hypervisor's multiple NICs are on the same network).

Support for multiple physical and virtual NICs is present in this version, mapped in the grid context in terms of a networking association string. A client will specify the required association in the metadata, assuming the deployment nodes support that association (otherwise, a creation fault will be thrown). An association is created by configuring an entry for it in the workspace-control configuration file. This should match the service association configurations.

For example, the node could have two physical NICs, one bound to a private LAN and one bound to the Internet. To accomodate virtual networking cards being bound to both NICs, a site admin could create two bridges, xenbr0 and xenbr1. Then an association would be configured for each bridge, e.g., one called "public" and one called "private" for example (a typical configuration).

In the sample workspace-control configuration file, find the association_ settings in the [networking] section. If xenbr0 is the 'public' interface, add this line:

association_0: private; xenbr0; vif0.0 ; none; 192.168.0.0/24

This specifies that there will be an association called 'private' with bridge 'xenbr0' with MAC address prefixes of 'none' (which means allow the service to decide) given to virtual NICs bound to this bridge and with an authorized IP range of 192.168.0.0 to 192.168.0.255.

The third field lists the interface of the NIC that is listening for DHCP requests. This is the interface that DHCP requests originating from workspaces should be allowed to broadcast to. Directing the request to a specific interface prevents DHCP requests from being broadcast to other workspaces as well as to the real network. You can doublecheck what vif is on what bridge by running "brctl show". Normally you'd run the DHCP server in dom0 and the defaults are like so:

xenbr0 - vif0.0
xenbr1 - vif0.1
xenbr2 - vif0.2

You are not required to run the DHCP server in dom0 as the workspaces' DHCP request broadcast can be directed to any interface, but currently to run it in another domain would require altering the dhcp-config.sh callout to work remotely (this would not be hard for you to do, grep for "dhcpconfigpath" in the xen_v2.py module and prepend ssh parameters, for example).

Note: if you only have one bridge for virtual network cards, a simple configuration would be to list one association like so:

association_0: default; none; 1.0.0.0/16

Note that no bridge name is given, the default bridge as determined by Xen will be used. Note the IP range is fake; if you want to not keep track of workspace IP assignments (to prevent conflicts) or to not check if a requested IP setting is valid for the bridge, ensure the check_ip_ranges, track_MAC_assignments, and track_IP_assignments settings are set to 'false'.

5.2.3. Blankspace creation

This version of workspace-control has a simple, configurable blankspace creation script. This can be edited to suit your needs but right now can only handle one creating one type of filesystem per deployment. This is ext2 by default. In the future, the filesystem type may be included as part of the client's request.

The script's default location is /opt/workspace/bin/blankcreate.sh

6. Testing

To test the workspace service, you can use the sample metadata and deployment files and a test image.

For a walkthrough and explanations, see the Workspace User Guide.

7. Installing just the workspace client

There are different ways to install a workspace client environment:

Once installed, see the Workspace User Guide.

7.1. Installing the reference client and Globus runtime

The basic Globus Java client environment almost entirely overlaps with the server environment. We provide an option to download a tarball of this environment with the Workspace sample client already installed into it.

Installing the client this way assumes you have a working Java JRE (1.4+) as well as externally obtained certificates.

The proxy generation tools included with this tarball are not the recommended tools for use with Linux. There is a problem with Java's interaction with Linux terminals that results in typed characters possibly being seen by an onlooker.

  1. Download the "Binary, Globus Java core plus client" tarball from the Workspace download page.

  2. Expand the tarball and set your GLOBUS_LOCATION environment variable to the created directory.

  3. Now see the Workspace User Guide.

7.2. Installing the reference client from binary GAR files

The GAR format is a portable, binary archive of everything needed to deploy the client into a pre-existing container.

Installing the client via GAR files assumes you have a working GT4 Java core installation. A Java JRE (1.4+) and Ant (1.6+ and 1.6.1+ if using Java 1.5) is assumed.

Configuring GT4 with the bare necessities for the Workspace reference client requires you to install the Java GT4 core with security tools:

make wsjava common globus_proxy_utils globus_simple_ca \
globus_simple_ca_setup postinstall

If you can obtain certificates and proxy tools from some other location, all that is required is Java GT4 core which is available as a binary tarball from the Globus Toolkit download page.

  1. Download the client binary tarball from the Workspace download page and expand it to a random directory.

  2. Set the GLOBUS_LOCATION environment variable as your environment requires (bourne shell shown):

    export GLOBUS_LOCATION=/path/to/globus

  3. Set the ANT_HOME environment variable as your environment requires (bourne shell shown):

    export ANT_HOME=/path/to/ant/home

  4. Deploy the three GAR files:

    ./deploy-client-gars.sh

  5. Now see the Workspace User Guide.

7.3. Installing the reference client from source

Installing the client from source assumes you have a working GT4 Java core installation. A Java JDK (1.4+) and Ant (1.6+ and 1.6.1+ if using Java 1.5) is assumed.

Configuring GT4 with the bare necessities for the Workspace reference client requires you to install the Java GT4 core with security tools:

make wsjava common globus_proxy_utils globus_simple_ca \
globus_simple_ca_setup postinstall

If you can obtain certificates and proxy tools from some other location, all that is required is Java GT4 core which is available as a binary tarball from the Globus Toolkit download page.

  1. Download the client source tarball from the Workspace download page.

  2. Set the GLOBUS_LOCATION environment variable as your environment requires (bourne shell shown):

    export GLOBUS_LOCATION=/path/to/globus

  3. Set the ANT_HOME environment variable as your environment requires (bourne shell shown):

    export ANT_HOME=/path/to/ant/home

  4. The root build file for the Workspace Service is workspace-client-*-src/build.xml and it contains a special build target to just build and install the client.

    Build and install the client:

    cd workspace-client-*-src
    ant deploy-client-only

  5. Now see the Workspace User Guide.

8. Network configuration details

For the Workspace backend to support networking information delivery to VMs, you are required to install DHCP and ebtables on each hypervisor node. When networking information in a workspace needs to change at its startup (which is typical upon deployment), workspace-control will make a call via sudo to a program that adds a MAC address to IP mapping into the local DHCP server for each of the workspace's NICs that need to be configured. It will also adjust ebtables rules for each of the workspace's NICs that need to be configured: these make sure the NICs are using the proper MAC and IP address as well as directing DHCP requests to the local DHCP server only.

To actually enact networking changes (using the Allocate networking method for example), the VM must set its own L3 networking information (IP address, default gateway, etc) from inside the VM. Currently we only support delivery of the information via DHCP. Booting into DHCP client mode is well supported in virtually every operating system in existence. Previously we passed information via kernel parameters which required a special understanding inside the VM. The result of using DHCP is that workspace images are easier to create and easier to maintain.

A DHCP server is required to run on each hypervisor node. The purpose of this server is to respond to broadcast requests from workspace's that are booting locally. Before starting a VM, if any of its NICs need to be configured via DHCP, workspace-control will call out to "dhcp-config.sh" via sudo, passing it a specific MAC address to IP mapping (as well as other information to be passed to the VM such as hostname, dns servers, broadcast address, default gateway, etc).

  • "Won't this interfere with my current DHCP server?" No.
  • "Will this cause unwanted packets on my physical LAN?" No.
  • "Will other workspaces be able to send broadcasts and get the wrong DHCP lease?" No.

In addition to a DHCP server, we also insert custom ebtables rules when the workspace is deployed. These rules accomplish three things:

  1. Only packets with the correct MAC address for this virtual interface are permitted.
  2. Broadcasted DHCP requests are only permitted to be bridged to the correct virtual interface (configured in workspace-control's configuration file, see the workspace-control networking configuration section.

  3. Only packets with the correct IP address for this virtual interface are permitted (as the NIC does not have an IP address yet when making a DHCP request, the IP check only happens if it is not a DHCP request.

A draft of the workspace DHCP design document is available here.

9. Integrating with a local scheduler with the Workspace Pilot

  1. The first step to switching to the pilot based infrastructure is to make sure you have at least one working node configured with workspace-control, following the instructions in this guide as if you were not going to use dynamically allocated VMMs via the pilot.

    If the only nodes available are in the LRM pool, it would be best to drain the jobs from one and take it offline while you confirm the setup. See the testing section for more information.

  2. Next, make sure that the system account the GT container is running in can submit jobs to the LRM. For example, run echo "/bin/true" | qsub

  3. Next, decide how you would like to organize the cluster nodes, such that the request for time on the nodes from the workspace service in fact makes it end up with usable VMM nodes.

    For example, if there are only a portion of nodes configured with Xen and workspace-control, you can set up a special node property (e.g. 'xen') or perhaps a separate queue or server. The service supports submitting jobs with node property requirements and also supports the full Torque/PBS '[queue][@server]' destination syntax if desired.

  4. Uncomment the WorkspaceFactoryService/SlotManagementAdapter section in the service's jndi-config.xml file that corresponds to the PilotSlotManagement comments.

    Comment out the short, default SlotManagementAdapter section, PilotSlotManagement is a replacement.

    The configuration comments should be self explanatory. There are a few to highlight here.

    • HTTP digest access authentication based notifications is a mechanism for pilot notifications. Each message from a pilot process to the workspace service takes on the order of 10ms on our current testbed which is reasonable.

      The contactPort setting is used to control what port the embedded HTTP server listens on. It is also the contact URL passed to the pilot program, an easy way to get this right is to use an IP address rather than a hostname.

      Note the accountsPath setting. Navigate to that file (under $GLOBUS_LOCATION/etc/workspace_service/pilot by default) and change the shared secret to something not dictionary based and 15 or more characters. A script in that directory will produce suggestions.

      This port may be blocked off entirely from WAN access via firewall if desired, only the pilot programs need to connect to it. If it is not blocked off, the use of HTTP digest access authentication for connections is still guarding access.

      Alternatively, you can configure only SSH for these notifications as well as configure both and use SSH as a fallback mechanism. When used as a fallback mechanism, the pilot will try to contact the HTTP server and if that fails will then attempt to use SSH. Those message are written to a file and will be read when the workspace service recovers. This is an advanced configuration, setting up the infrastructure without this configured is recommended for the first pass (reduce your misconfiguration chances).

    • The maxMB setting is used to set a hard maximum memory allotment across all workspace requests (no matter what the authorization layers allow). This a "fail fast" setting, making sure dubious requests are not sent to the LRM.

      To arrive at that number, you must arrive at the maximum amount of memory to give domain 0 in non-hosting mode. This should be as much as possible and you will also configure this later into the pilot program settings (the pilot will make sure domain 0 gets this memory back when returning the node from hosting mode to normal job mode).

      When the node boots and xend is first run, you should configure things such that domain 0 is already at this memory setting. This way, it will be ready to give jobs as many resources as possible from its initial boot state.

      Domain 0's memory is set in the boot parameters. On the "kernel" line you can add a parameter like this: dom0_mem=2007M

      If it is too high you will make the node unbootable, 2007M is an example from a 2048M node and was arrived at experimentally. We are working on ways to automatically figure out the highest number this can be without causing boot issues.

      Take this setting and subtract at least 128M from it, allocating the rest for guest workspaces. Let's label 128M in this example as dom0-min and 2007 as dom0-max. Some memory is necessary for domain 0 to at least do privileged disk and net I/O for guest domains.

      These two memory setting will be configured into the pilot to make sure domain 0 is always in the correct state. Domain 0's memory will never be set below the dom0-min setting and will always be returned to the dom0-max when the pilot program vacates the node.

      Instead of letting the workspace request fail on the backend just before instantiation, the maxMB setting is configured in the service so that requests for more memory will be rejected up front.

      So [ dom0-max minus dom0-min equals maxMB ]. And again maxMB is the maximum allowed for guest workspaces.

      ( You could make it smaller. But it would not make sense to make it bigger than [ dom0-max minus dom0-min ] because this will cause the pilot program itself to reject the request. )

    • The pilotPath setting must be gotten right and double checked. See this bugzilla item

  5. Next, note your pilotPath setting and put a copy of workspacepilot.py there. Run chmod +x on it and that is all that should be necessary for the installation.

    Python 2.3 or higher (though not Python 3.x) is also required but this was required for workspace-control as well.

    A sudo rule to the xm program is also required but this was configured when you set up workspace-control. If the account the pilot jobs are run under is different than the account that runs workspace-control, copy the xm sudo rule for the account.

  6. Open the workspacepilot.py file in an editor. These things must be configured correctly and require your intervention (i.e., the software cannot guess at them):

    • Search for "secret: pw_here" around line 80. Replace "pw_here" with the shared secret you configured above.
    • Below that, set the "minmem" setting to the value you chose above that we called dom0-min.
    • Set the "dom0_mem" setting to the value you chose above that we called dom0-max.

    The other configurations should be explained enough in the comments and they also usually do not need to be altered.

    You might like to create a directory for the pilot's logfiles instead of the default setting of "/tmp" for the "logfiledir" configuration. You might also wish to separate out the config file from the program. The easiest way to do that is to configure the service to call a shell script instead of workspacepiloy.py. This in turn could wrap the call to the pilot program, for example: "/opt/workspacepilot.py -p /etc/workspace-pilot.conf $@"

  7. Now restart the GT container and submit test workspace requests as guided in the testing section

10. Troubleshooting

Any questions can be posted to the workspace-user mailing list and will likely be answered promptly by a member of the community. For instructions on how to subscribe and post messages to this list, see this web page.

  • Problem: The VMs do not obtain addresses via DHCP.

    Solution: Make sure dom0's interface name(s) configuration is valid, the "dhcpvif" part of the association configuration in the worksp.conf file. See the backend networking configuration section for more details and the right setting to use.

  • Problem: Sometimes, from the start of the workspace's deployment, one of the VM's NICs is unreachable. (specifically, the ARP protocol does not resolve the IP address to a MAC address)

    Solution: Make sure the MAC address prefix is valid. See the backend networking configuration section for more details and the right setting to use.

  • Problem: The container doesn't start anymore and you are getting a long JNDI related exception. You see "InvocationTargetException" and "NameAlreadyBoundException" and probably "Name home is already bound in this Context".

    Solution: This will happen if you make a backup of the "etc/workspace_service" directory inside the etc directory. For example, you ran "cp -a workspace_service workspace_service_backups". The container thinks these are both directories for services and tries to consume both JNDI files. Hence, the configurations are consumed multiple times which is an error because only one of each can be "bound in this context" at a time.

11. Configuration migration from older service versions

11.1. Configuration migration from TP1.3.2 to TP1.3.3.1

The services must be entirely undeployed before building and installing TP1.3.3.1.

However, you can back up your 'etc/workspace_service' directory configurations and replace them exactly as they were. You can also keep the database intact in theory.

The workspace-control worksp.conf file can be exactly the same as well, although there is a new configuration. The "[behavior] --> num_cpu_per_vm" configuration allows you to peg the number of vcpus that are assigned to every workspace. See the TP1.3.3.1 sample configuration for the comments around this config.

You don't need to even reinstall workspace-control at all unless you want the workspace-control updates mentioned in the TP1.3.3.1 <a>changelog</a>.

11.2. Old versions to to TP1.3.2

All persistent state must be removed from GLOBUS_LOCATION and the services must be entirely undeployed before building and installing TP1.3.2.

All workspace-control files and state must be removed and workspace-control must be reinstalled. The configuration file values from TP1.2.3, TP1.3, TP1.3.1 can be used, for TP1.3.2 it is recommended to start over with the new template.

A change in this service version is that the workspce-control configuration file can be identical across all VMM nodes because MAC address selection was centralized to the service.

First, see the TP1.3.2 changelog for an overview of what has changed in general. Also, see the intermediate changelogs if you are upgrading to TP1.3.2 from an older version other than the directly preceding TP1.3.1.

When in doubt, use the diff tool on backups of your original configuration file and a pristine copy of the new sample. You did backup your original configuration files, right?

11.3. TP1.3.1 to TP1.3.2

11.3.1. TP1.3.2 JNDI config file

  • New: WorkspaceFactoryService/NetworkAdapter/macPrefix. Controls the global MAC address prefix for workspaces.
  • New: WorkspaceService/home/localTempDirectory. Stores customization task data before it is moved to VMM node where the file(s) is installed to the workspace.
  • New: WorkspaceService/home/backendTempDirectory. Directory to send the customization task data on the backend nodes. Default is /opt/workspace/tmp. This must match what the backend configuration file has for this.
  • New: WorkspaceService/home/scpPath. Path to SCP.
  • New: WorkspaceService/CreationAuthorizationCallout option: "GroupAuthz" The GroupAuthz plugin is a new authorization callout that does not need separate installation but does need explicit activation before using. A longish summary of it is given in the TP1.3.2 changelog.

11.3.2. Workspace-control conf file

Recommendation is to start with new configuration file template and move old configurations to it. No names or semantics have changed.

An extra sudo rule is required for the mount-alter.sh tool. The workspace-control installer will print the exact line you need to enter, there are also samples in the new configuration template.

  • MAC addresses may be centralized now so change all association lines to list "none" instead of a MAC prefix. This allows the workspace-control to accept whatever the service specifies for the MAC address. This allows the configuration file to be identical on all VMM nodes.
  • New "mounttool" option gives to mount-alter.sh path
  • New "mountdir" option specifies a base directory for mounting VM files that need customization.
  • New "tmpdir" option specifies a directory where customization task data can be stored. These files are sent here by the workspace service before and startup commands. This path value must match the service's "backendTempDirectory" configuration (see above).
  • New "pygrub" option for path to the pygrub bootloader in order to support booting hard disk images instead of partition files (optional).

11.4. TP1.3 to TP1.3.1

11.4.1. TP1.3.1 JNDI config file

  • WorkspaceFactoryService/SlotManagementAdapter has a new alternative: PilotSlotManagement. Only one of these sections may be uncommented at a time. See Integrating to a local scheduler with the Workspace Pilot
  • New: WorkspaceService/home/sshIdentityFile. Allows SSH invocations to use an alternate private key file.

11.4.2. Association config files

Association configurations may no longer include 'none' for the hostnames. Each address requires a unique hostname (see the changelog, this is for DHCP delivery to work correctly) but it does not necessarily need to resolve via DNS.

11.5. TP1.2.3 to TP1.3

11.5.1. TP1.3 JNDI config file

There were structural changes in the jndi-config.xml file this release, so it is highly recommended to start with the TP1.3 example file and move old settings manually into it.

  • WorkspaceFactoryService/home/maxGroupSize

    This configuration is new. See the Factory policies section for explanation.

  • WorkspaceFactoryService/home/WSRFResourceMinutesPastRunningTime

    This configuration had a typo in it, "resouce" is now "resource".

  • WorkspaceFactoryService/home/certificateFileLocation

    Removed. The directory that this pointed to should be removed as well.

  • WorkspaceService/home/backendCertDir WorkspaceService/home/scpPath

    Removed.

  • WorkspaceFactoryService/home/associationFileLocation

    Moved to WorkspaceFactoryService/NetworkAdapter/associationDirectory

  • WorkspaceFactoryService/home/resourcepoolFileLocation

    Moved to WorkspaceFactoryService/SlotManagementAdapter/resourcepoolDirectory

  • WorkspaceService/home

    Implementation class changed.

  • WorkspaceService/home/threadPoolInitialSize

    Added, controls initial number of threads in task thread pool.

  • WorkspaceService/home/threadPoolMaxSize

    Added, controls maximum number of threads in task thread pool.

  • WorkspaceService/VerboseLoggingAdapter

    Added accounting trace logging flag.

  • WorkspaceService/SchedulerAdapter

    Implementation class changed.

  • Sections added:

    • WorkspaceFactoryService/NetworkAdapter
    • WorkspaceFactoryService/SlotManagementAdapter
    • WorkspaceService/AccountingEventAdapter
    • WorkspaceService/AccountingDB
    • WorkspaceService/BindingAdapter
    • WorkspaceGroupService
    • WorkspaceStatusService
    • WorkspaceMasterContext

11.5.2. Association config files

Association and resource pool configurations are now read in at container startup time in a different way. See the networking and resource pool configuration sections for more details.

The old files will not work in TP1.3, you need to remove the last two tokens from each line, the 'certfile' and 'keyfile' fields.

11.5.3. Workspace-control conf file

The worksp.conf file does not need to be altered, the TP1.2.3 can be re-used, but several configurations are never consulted in this version.

Certificate and keyfile staging functionality was removed from the service. The 'mounttool', 'mountdir', and 'tmpdir' configs are no longer consulted by workspace-control. They have been removed from the sample configuration file.

If you were enabling that functionality, the accompanying sudo rule for the mount tool can be removed as well.

'family' and 'find_maxvmram' were removed from the sample configuration file.

As mentioned above, note that it is possible to use a TP1.2.3 workspace-control configuration file with a TP1.3 (or TP1.3.1) workspace-control, the extra configurations will just be ignored by the TP1.3.1 workspace-control program.