General purpose clouds are expected to include these base services:
Each of these services have different resource requirements.
When designing compute resource pools, a number of factors can impact your design decisions. Factors such as number of processors, amount of memory, and the quantity of storage required for each hypervisor must be taken into account.
A compute design that allocates multiple pools of resources makes best use of application resources, and is commonly referred to as bin packing.
When selecting a processor, compare features and performance characteristics. Some processors include features specific to virtualized compute hosts, such as hardware-assisted virtualization, and technology related to memory paging (also known as EPT shadowing). These types of features can have a significant impact on the performance of your virtual machine.
good or bad impact???????
You will also need to consider the compute requirements of non-hypervisor nodes (sometimes referred to as resource nodes). This includes controller, object storage, and block storage nodes, and networking services
The number of processor cores and threads impacts the number of worker threads which can be run on a resource node
Workload can be unpredictable in a general purpose cloud, so consider including the ability to add additional compute resource pools on demand
start by allocating hardware designs that are capable of servicing the most common instance requests.
Choose a networking service based on the requirements of your instances. The architecture and design of your cloud will impact whether you choose OpenStack Networking(neutron), or legacy networking (nova-network).
Selected supplemental software solution impacts and affects the overall OpenStack cloud design. This includes software for providing clustering, logging, monitoring and alerting.
Inclusion of clustering software, such as Corosync or Pacemaker, is determined primarily by the availability requirements. The impact of including (or not including) these software packages is primarily determined by the availability of the cloud infrastructure and the complexity of supporting the configuration after it is deployed. The OpenStack High Availability Guide provides more details on the installation and configuration of Corosync and Pacemaker, should these packages need to be included in the design.
Requirements for logging, monitoring, and alerting are determined by operational considerations. Each of these sub-categories includes a number of various options.
OpenStack clouds require appropriate monitoring platforms to ensure errors are caught and managed appropriately. Specific meters that are critically important to monitor include:
- Image disk utilization
- Response time to the Compute API
Leveraging existing monitoring systems is an effective check to ensure OpenStack environments can be monitored.
Assessing the average workloads and increasing the number of instances that can run within the compute environment by adjusting the overcommit ratio is another option. It is important to remember that changing the CPU overcommit ratio can have a detrimental effect and cause a potential increase in a noisy neighbor. The additional risk of increasing the overcommit ratio is more instances failing when a compute host fails.
A general purpose OpenStack cloud design should incorporate the core OpenStack services to provide a wide range of services to end-users. The OpenStack core services recommended in a general purpose cloud are:
A general purpose cloud may also include OpenStack Object Storage (swift). OpenStack Block Storage (cinder). These may be selected to provide storage to applications and instances.
An overcommit ratio is the ratio of available virtual resources to available physical resources. This ratio is configurable for CPU and memory. The default CPU overcommit ratio is 16:1, and the default memory overcommit ratio is 1.5:1. Determining the tuning of the overcommit ratios during the design phase is important as it has a direct impact on the hardware layout of your compute nodes.
When you design an OpenStack network architecture, you must consider layer-2 and layer-3 issues. Layer-2 decisions involve those made at the data-link layer, such as the decision to use Ethernet versus Token Ring. Layer-3 decisions involve those made about the protocol layer and the point when IP comes into the picture. As an example, a completely internal OpenStack network can exist at layer 2 and ignore layer 3. In order for any traffic to go outside of that cloud, to another network, or to the Internet, however, you must use a layer-3 router or switch.
The past few years have seen two competing trends in networking. One trend leans towards building data center network architectures based on layer-2 networking. Another trend treats the cloud environment essentially as a miniature version of the Internet. This approach is radically different from the network architecture approach in the staging environment: the Internet only uses layer-3 routing rather than layer-2 switching.
A network designed on layer-2 protocols has advantages over one designed on layer-3 protocols. In spite of the difficulties of using a bridge to perform the network role of a router, many vendors, customers, and service providers choose to use Ethernet in as many parts of their networks as possible. The benefits of selecting a layer-2 design are:
- Ethernet frames contain all the essentials for networking. These include, but are not limited to, globally unique source addresses, globally unique destination addresses, and error control.
- Ethernet frames can carry any kind of packet. Networking at layer 2 is independent of the layer-3 protocol.
- Adding more layers to the Ethernet frame only slows the networking process down. This is known as ‘nodal processing delay’.
- You can add adjunct networking features, for example class of service (CoS) or multicasting, to Ethernet as readily as IP networks.
- VLANs are an easy mechanism for isolating networks.
Most information starts and ends inside Ethernet frames. Today this applies to data, voice (for example, VoIP), and video (for example, web cameras). The concept is that, if you can perform more of the end-to-end transfer of information from a source to a destination in the form of Ethernet frames, the network benefits more from the advantages of Ethernet. Although it is not a substitute for IP networking, networking at layer 2 can be a powerful adjunct to IP networking.
Layer-2 Ethernet usage has these advantages over layer-3 IP network usage:
- Reduced overhead of the IP hierarchy.
- No need to keep track of address configuration as systems move around. Whereas the simplicity of layer-2 protocols might work well in a data center with hundreds of physical machines, cloud data centers have the additional burden of needing to keep track of all virtual machine addresses and networks. In these data centers, it is not uncommon for one physical node to support 30-40 instances.
|Networking at the frame level says nothing about the presence or absence of IP addresses at the packet level. Almost all ports, links, and devices on a network of LAN switches still have IP addresses, as do all the source and destination hosts. There are many reasons for the continued need for IP addressing. The largest one is the need to manage the network. A device or link without an IP address is usually invisible to most management applications. Utilities including remote access for diagnostics, file transfer of configurations and software, and similar applications cannot run without IP addresses as well as MAC addresses.
Layer-2 architecture limitations
Outside of the traditional data center the limitations of layer-2 network architectures become more obvious.
- Number of VLANs is limited to 4096.
- The number of MACs stored in switch tables is limited.
- You must accommodate the need to maintain a set of layer-4 devices to handle traffic control.
- MLAG, often used for switch redundancy, is a proprietary solution that does not scale beyond two devices and forces vendor lock-in.
- It can be difficult to troubleshoot a network without IP addresses and ICMP.
- Configuring ARP can be complicated on large layer-2 networks.
- All network devices need to be aware of all MACs, even instance MACs, so there is constant churn in MAC tables and network state changes as instances start and stop.
- Migrating MACs (instance migration) to different physical locations are a potential problem if you do not set ARP table timeouts properly.
It is important to know that layer 2 has a very limited set of network management tools. It is very difficult to control traffic, as it does not have mechanisms to manage the network or shape the traffic, and network troubleshooting is very difficult. One reason for this difficulty is network devices have no IP addresses. As a result, there is no reasonable way to check network delay in a layer-2 network.
Layer-3 architecture limitations
The main limitation of layer 3 is that there is no built-in isolation mechanism comparable to the VLANs in layer-2 networks. Furthermore, the hierarchical nature of IP addresses means that an instance is on the same subnet as its physical host. This means that you cannot migrate it outside of the subnet easily. For these reasons, network virtualization needs to use IP encapsulation and software at the end hosts for isolation and the separation of the addressing in the virtual layer from the addressing in the physical layer. Other potential disadvantages of layer 3 include the need to design an IP addressing scheme rather than relying on the switches to keep track of the MAC addresses automatically and to configure the interior gateway routing protocol in the switches.
There are two types of monitoring: watching for problems and watching usage trends. The former ensures that all services are up and running, creating a functional cloud. The latter involves monitoring resource usage over time in order to make informed decisions about potential bottlenecks and upgrades.
Nagios is an open source monitoring service. It’s capable of executing arbitrary commands to check the status of server and network services, remotely executing arbitrary commands directly on servers, and allowing servers to push notifications back in the form of passive monitoring. Nagios has been around since 1999. Although newer monitoring services are available, Nagios is a tried-and-true systems administration staple.
Resource alerting provides notifications when one or more resources are critically low. While the monitoring thresholds should be tuned to your specific OpenStack environment, monitoring resource usage is not specific to OpenStack at all—any generic type of alert will work fine.
Some of the resources that you want to monitor include:
- Disk usage
- Server load
- Memory usage
- Network I/O
- Available vCPUs
Resources such as memory, disk, and CPU are generic resources that all servers (even non-OpenStack servers) have and are important to the overall health of the server. When dealing with OpenStack specifically, these resources are important for a second reason: ensuring that enough are available to launch instances. There are a few ways you can see OpenStack resource usage. The first is through the
# nova usage-list
This command displays a list of how many instances a tenant has running and some light usage statistics about the combined instances. This command is useful for a quick overview of your cloud, but it doesn’t really get into a lot of details.
nova database contains three tables that store usage information.
nova.quota_usages tables store quota information. If a tenant’s quota is different from the default quota settings, its quota is stored in the
nova.quotas table. For example:
mysql> select project_id, resource, hard_limit from quotas;
| project_id | resource | hard_limit |
| 628df59f091142399e0689a2696f5baa | metadata_items | 128 |
| 628df59f091142399e0689a2696f5baa | injected_file_content_bytes | 10240 |
| 628df59f091142399e0689a2696f5baa | injected_files | 5 |
| 628df59f091142399e0689a2696f5baa | gigabytes | 1000 |
| 628df59f091142399e0689a2696f5baa | ram | 51200 |
| 628df59f091142399e0689a2696f5baa | floating_ips | 10 |
| 628df59f091142399e0689a2696f5baa | instances | 10 |
| 628df59f091142399e0689a2696f5baa | volumes | 10 |
| 628df59f091142399e0689a2696f5baa | cores | 20 |
nova.quota_usages table keeps track of how many resources the tenant currently has in use:
mysql> select project_id, resource, in_use from quota_usages where project_id like '628%';
| project_id | resource | in_use |
| 628df59f091142399e0689a2696f5baa | instances | 1 |
| 628df59f091142399e0689a2696f5baa | ram | 512 |
| 628df59f091142399e0689a2696f5baa | cores | 1 |
| 628df59f091142399e0689a2696f5baa | floating_ips | 1 |
| 628df59f091142399e0689a2696f5baa | volumes | 2 |
| 628df59f091142399e0689a2696f5baa | gigabytes | 12 |
| 628df59f091142399e0689a2696f5baa | images | 1 |
By comparing a tenant’s hard limit with their current resource usage, you can see their usage percentage. For example, if this tenant is using 1 floating IP out of 10, then they are using 10 percent of their floating IP quota. Rather than doing the calculation manually, you can use SQL or the scripting language of your choice and create a formatted report:
| some_tenant |
| Resource | Used | Limit | |
| cores | 1 | 20 | 5 % |
| floating_ips | 1 | 10 | 10 % |
| gigabytes | 12 | 1000 | 1 % |
| images | 1 | 4 | 25 % |
| injected_file_content_bytes | 0 | 10240 | 0 % |
| injected_file_path_bytes | 0 | 255 | 0 % |
| injected_files | 0 | 5 | 0 % |
| instances | 1 | 10 | 10 % |
| key_pairs | 0 | 100 | 0 % |
| metadata_items | 0 | 128 | 0 % |
| ram | 512 | 51200 | 1 % |
| reservation_expire | 0 | 86400 | 0 % |
| security_group_rules | 0 | 20 | 0 % |
| security_groups | 0 | 10 | 0 % |
| volumes | 2 | 10 | 20 % |
The preceding information was generated by using a custom script that can be found on GitHub.
Trending can give you great insight into how your cloud is performing day to day. You can learn, for example, if a busy day was simply a rare occurrence or if you should start adding new compute nodes.
Trending takes a slightly different approach than alerting. While alerting is interested in a binary result (whether a check succeeds or fails), trending records the current state of something at a certain point in time. Once enough points in time have been recorded, you can see how the value has changed over time.
All of the alert types mentioned earlier can also be used for trend reporting. Some other trend examples include:
- The number of instances on each compute node
- The types of flavors in use
- The number of volumes in use
- The number of Object Storage requests each hour
- The number of
nova-api requests each hour
- The I/O statistics of your storage services
As an example, recording
nova-api usage can allow you to track the need to scale your cloud controller. By keeping an eye on
nova-api requests, you can determine whether you need to spawn more
nova-api processes or go as far as introducing an entirely new server to run
nova-api. To get an approximate count of the requests, look for standard INFO messages in
# grep INFO /var/log/nova/nova-api.log | wc
You can obtain further statistics by looking for the number of successful requests:
# grep " 200 " /var/log/nova/nova-api.log | wc
By running this command periodically and keeping a record of the result, you can create a trending report over time that shows whether your
nova-api usage is increasing, decreasing, or keeping steady.
A tool such as collectd can be used to store this information. While collectd is out of the scope of this book, a good starting point would be to use collectd to store the result as a COUNTER data type. More information can be found in collectd’s documentation.
Feature requests typically start their life in Etherpad, a collaborative editing tool, which is used to take coordinating notes at a design summit session specific to the feature.
for etherpad, Additionally, you’ll need node.js installed, Ideally the latest stable version, we recommend installing/compiling nodejs from source (avoiding apt).
How To Install Using a PPA
how to interact with openstack