![]() |
Cisco Systems
CSS 11800 Content Services Switch
by Steve Broadhead Table of Contents Aims of test
N.B.� The testing by NSS was not intended to saturate the switch nor determine its maximum throughput. Throughput testing has already been carried out in the US and the results are available from Ciscos web site. A recent upgrade to the switching fabric not included in our test configuration also means that performance will be enhanced beyond any currently published results. Here we focus on testing the range of features and working methodologies at Layers 2-7 that the CSS 11800 offers, working within realistic - albeit still relatively stressed - traffic levels and to evaluate the functionality of the CSS 11800 within this environment. We would like to extend our thanks to Cisco Systems, Hewlett-Packard, Peapod Distribution and Bluecurve (Dynameasure is a trademark of Bluecurve Inc.) for assistance with the testing and supply of equipment and benchmarking products. Finally, a big thank you to Chevin Software for use of its CNA Pro network monitor/analyser software which enabled us to monitor the exact state of the network at all times and accurately configure the hardware based on knowledge of the traffic streams it analysed. What is Web Content Switching? Two years ago we were all asking: what is Layer3 switching? The answer was: a switch that acts as a router, routing traffic at wire pure Ethernet speeds. Simple enough answer. Then came Layer 4, typically defined as the ability to add some extra level of intelligence, such as QoS (Quality of Service) parameters or rule-based load balancing into the routing-switch mix. The latter is typified by the round-robin algorithm, where each specified server is used in turn, in an attempt to share the load equally, or weighted round-robin, where the same rule is applied but with specified weightings given to each server, usually where some servers are more powerful than others. But now we have Layer 5-7 switching in the form of Cisco Systems CSS 11800, defined by the company as a web content switch. So, firstly, what do they mean exactly by web content and, secondly, has the game really moved on another step? In simple terms, what Cisco has done is to add a web services layer to the existing switching repertoire. If Layer 3 was designed for multi-protocol routing and Layer 4 specifically for intelligently handling IP-based LAN traffic, then the web content layer is specifically designed for handling web-based IP-traffic HTTP in other words. The company argues that standard Layer 3-4 switches were simply not designed for HTTP traffic optimisation whereas the CSS 11800 was built from the ground up with web content switching as its primary focus.� The key point is that web traffic is different to classic IP-based LAN data traffic. For a start, it is largely asymmetric, with much larger flows back out to the users from the servers, than the inward-bound flows, typified by a search request followed by a download of results, for example. It is also very different in the way sessions are constantly brought up and torn down, often with little data involved but many, many concurrent connections. Then there are sudden huge session and data spikes to contend with when something hits the net that everyone wants to take a look at. For an ISP this latter scenario is a nightmare and there are already many well-documented examples of this nightmare coming to fruition and a user finding that their service provider has temporarily lost the ability to provide that service. Not ideal! Cisco claims that web switching requires the ability to parse each content request and classify flows using URLs, host tags, and cookies so that each request can be isolated and treated according to business policies defined and stored in a central database a kind of internal expert rules-based system. This takes us to the next key claim Cisco makes about its switch technology. Not only is it designed to handle web traffic, but it also designed to optimise those traffic flows to try and prevent the gridlock situation from arising. This means it must have the ability to constantly find the optimal connection to any server or cache device at the ISP or corporate data farm shall we say. In turn, this means adding sufficient intelligence into the device in order for it to be able to continually analyse traffic flows and direct that traffic accordingly. Simply knowing basic source and destination data is not enough. To really optimise that traffic you need to know a lot about the actual content being requested and generated - switching on the web content in other words. And here lies Ciscos great claim to fame: the ability to get right insider a URL and switch traffic based on any element for example a file extension within that URL. Cookie content is also analysed and switched upon in the same way.� By knowing what kind of traffic is being requested, the CSS 11800 can go beyond basic load balancing of servers and start actively optimising the entire back-end of the network for the data flows being received. For example, certain types of traffic, such as real audio or video content require more guarantees of bandwidth availability in order to work sufficiently well compared with standard browsing. In a Layer 3-4 environment on a LAN, this kind of traffic would typically be handled by setting up a QoS rule for certain source and destination addresses or specific switch ports. This is fine for a relatively static environment, but in a truly dynamic environment like the Internet, a more dynamic solution is required.� Also, in this mode of operation, load balancing requires all content to be replicated between all load-balanced servers. This is because they cannot explicitly direct traffic based on the content being requested. So the ability to look inside every HTTP payload and extract information about the data request from that packet is clearly a potential step forward from previous switching architectures and technologies.� So how does the CSS 11800 actually handle web traffic? First it has to set-up a traffic flow, identifying the specific user and content being requested in order for it to apply the correct policy and route the data request to the best destination point at that given moment in time. Once a flow is established the switch can invoke wire-speed forwarding of that traffic for that session. Throughout the session, the switch monitors the traffic and can provide a huge amount of statistical and management information as a result, such as having ability to aggregate per-flow statistics and report events and alarms for further action. The key point here is that the data management is continuous - vital given the changing nature of Internet sessions. In order to create the flows, the CSS 11800 uses protocol spoofing, setting up a virtual IP address for the users browser to talk to. This means a virtual connection can be maintained at all times between the browser and the switch. In practice, then, a user keys in a URL to their browser to request some web content. At the ISP, say, the CSS 11800 uses a virtual IP address to confirm the connection and intercepts the request for that URL. The TCP connection is then spoofed back to the browser client and, thereafter, all subsequent HTTP requests from that client are analysed in turn. It then applies the current rule set invoked on the switch to select the best destination point best server or cache for each request. At this point the flow is created between the client and the optimal data source and subsequent packets are switched at wire speed by dedicated ASICs within the switch. Once it has received and inspected the HTTP request header for URL and cookie information, the CSS 11800 must determine the best server. This is based on both self-learned and user-defined policies. These include the location and availability of content, current server and application performance levels and the geographic proximity to the requesting browser. This is especially important in a distributed web content switch environment where multiple CSS 11800s can be inter-connected across a WAN - the optimum load balancing algorithms for the content being requested and any priority services designated by specific types of content and users themselves. Based on these criteria, the Web server either creates a connection with the best server in the local site or redirects the request to a better resource.� The CSS 11800 is a modular, chassis-based web content switch. It sits at the top of Ciscos range of web content switches, with the CSS 11100 a standalone unit designed for smaller sites or POPs initially forming the entry-point However, Cisco has recently introduced two new members to its CSS family, the CSS 11050 and the CSS 11150, the latter effectively superceding the CSS 11100. A key point is that the full feature set of the larger CSS 11800 is available on both of the smaller systems, albeit with limited scalability. But even the baby CSS 11050 offers more than twice the performance of the outgoing CSS 11100, while the CSS 11150 is claimed to offer a three-fold increase in performance over its predecessor. Both are based around a new high-performance MIPS microprocessor which has been introduced to control HTTP flow setup, along with wire-speed flow forwarding engines. The CSS 11050 has been designed with small web sites and POPs in mind and has eight full-duplex, auto-sensing 10/100 Ethernet interfaces and one Gigabit Ethernet port. The CSS 11150 has been designed for what Cisco describes as moderate traffic Web sites and POPs. Like the CSS 11050 this is a stand-alone product providing, in this case, either 12 or 16 full-duplex auto-sensing 10/100 Ethernet ports, or 12 10/100 Ethernet ports plus two Gigabit Ethernet uplink ports or 4 100Base-FX ports.� The CSS 11800 has a wide range of module combination options (10/100 Ethernet, Gigabit Ethernet, management) within its 15-slot chassis. A minimum of two slots (four in a redundant configuration) is taken up with the management module and the switching fabric module which has a 20Gbps backplane. There are eight free slots available for pure I/O modules with options as follows:
In total, then, the CSS 11800 can support up to 64 full-duplex 100BASE-TX ports, up to 48 100BASE-TX ports with 16 100BASE-FX ports, up to 32 Gigabit Ethernet interfaces, or a combination of interfaces, based a mix of the modules outlined above. At the heart of the product is the switch fabric, supported by the I/O modules, each of which have independent processing, so there should be no single performance bottleneck based on the processor design. Within this is the content policy engine which consists of four MIPS RISC processors running a real-time operating system with over 512MB of memory. This provides the resources for the flow set-up; the ability to read full URLs, dynamically locate "mobile" user cookies anywhere in the HTTP header, and to apply multiple policies to route content requests to the best site and server in real-time. Then flow forwarding is then handled by the CSS 11800s distributed flow forwarding engines. Up to 16 distributed ASICs with up to 128MB of memory are used to deliver user content requests at wire speed across the switch. Within the mix of ASICs and embedded microprocessors that make up the engine room is what Cisco calls the control plane. This provides centralised multi-processor resources with a real time operating system and substantial addressable memory, with scalable processor and memory resources which are dynamically applied to ports that require the resources for flow set-up and content policy management in real time. In theory this means that the Cisco architecture should be able to support what is a very processor-consuming method of managing traffic across a large number of connections. The switch supports all TCP- and UDP-based Web protocols, wire-speed NAT, and integrated IP routing. It is designed to optimise both content requests and delivery for HTTP, passive FTP, and streaming media protocols. The switch itself is actually almost a complete web site in its own right, complete with hard disk for web content and an integrated firewall. The latter - at the heart of what Cisco calls its FlowWall Security feature - is a true, ICSA-compliant firewall providing wire-speed, per-flow filtering of content requests. Security policies can be implemented based on any combination of source address, destination address, protocol, type, or content URL. In addition, FlowWall Security provides intelligent flow inspection technology to screen for common DoS (denial of service) attacks, such as SYN floods, ping floods, and "smurfs." Looking to the future, an integrated cache is just one of many obvious possibilities which spring to mind for future product enhancements. An important point to note is that the CSS 11800 is designed to work with cache engines as well as server farms, both from a load balancing and optimal data source perspective. Cisco claims it delivers up to 400% improvement in Web cache efficiency for transparent, proxy, and reverse proxy configurations.� Another important point to note is that the CSS 11800 can also be used to load-balance external firewall products. Key, however, is the switches ability to provide high-speed HTTP flow setup, dynamic and intelligent server selection based on real server load and content availability, and URL and cookie-based policy and traffic prioritisation. For example, it is possible to dedicate each web server to a particular content type (including redundant failover, backup or over-spill servers) and allow the switch to identify the appropriate server (or cache) to direct a URL request to, based on the content being demanded. In this way, for example, it is possible to provide suitable bandwidth for live audio or Real Player type video feeds at all times, based on the ability to recognise a specific request for that kind of content. In Use: Setting up the CSS 11800 Cisco has chosen to implement the classic Cisco CLI (Command Line Interface) for configuring the CSS 11800, as well as the other switches in the family. This means creating a terminal session, or attaching via an integrated Ethernet management port, once an IP address has been assigned to it. You can also Telnet into the box. What is most impressive about the CSS 11800 is the huge range of options available for configuring the switch. Even a single application such as server load balancing has many different configuration options, including standard Layer 3-4 methods such as round robin and weighted round robin, as well as the Cisco-specific URL/cookie content-based balancing. With respect to the latter, a key feature is the ArrowPoint Content Aware (ACA) algorithm.� ACA was designed to eliminate the problems of manually fine-tuning a switch configuration time and time again for load-balancing, by automating the system, based on dynamic server load and performance. ACA establishes a "baseline load" for every server by monitoring every TCP connection so that it can redistribute requests when the load varies from the baseline for that server. Additionally, because ACA maintains the state for actual http flows, it tracks long versus short flows, content request frequency, content access history and cache coherency. For instance, when content becomes hot, it is critical to spread load across more servers to satisfy increased demand. At the same time, infrequently accessed information cold content should be sent to alternate servers to free up the caches of the servers that are handling hot content.� Or if the content requested is better served in a different location, or a different server cluster in the local data centre, the switch will send an HTTP redirect to the client to transparently route requests to the best server. As a result, ACA attempts to maximise server efficiency by effectively distributing load among all eligible servers and driving servers to peak load without overloading them. Another important feature is the support for sticky connections. In any authenticated Web-based application, it is necessary to provide a persistent connection between a users� browser and the web or database server to which it is connected. Because HTTP does not carry any state information for these applications, it is important for the browser to be mapped to the same server for each HTTP request until a user's transaction is complete. This ensures that the user is not load-balanced in mid-session to a different server and forced to log in again. The aim of the testing was to confirm that Ciscos claimed benefits of web content switching could be upheld. We also wanted to test the ability to intelligently direct traffic to either cache or server depending on whether the traffic was passive (cacheable) or dynamic (non-cacheable). For the tests, the CSS 11800 was also configured with redundant management and switch fabric modules. For caching tests we further added a CacheFlow CF110 cache engine and a Cisco CSS 11100 (now superseded by the CSS 11150), so that we had Cisco switches sat directly in front of the cache and in front of the server farm. Both CSS switches were configured with the same software/firmware revisions and therefore differed only in physical and outright performance capabilities. For the testing we used a combination of benchmarks; ZD Webbench 3.0 for both background traffic and a constant means of providing comparisons as the configuration changed on the CSS 11800. We ran WB3.0 with up to 480 virtual clients. Alongside this we ran Microsofts W-Cat web server benchmark suite for testing with different traffic types, running up to 120 virtual clients. In addition we used� Socrates, a public-domain tool for generating content requests and testing server latency and Bluecurves Dynameasure for live email traffic. With Dynameasure we were able to create true test MS Outlook clients speaking to an Exchange server. Each client had to carry out a series of operations - 34 in all - such as reads, sends, replies, copys, deletes and other typical email actions. This Bluecurve software enabled us to create multiple virtual clients up to 25 on each PC, using the multi-threading techniques of NT, with each virtual client then sending and receiving live data during the tests. The CSS 11800 was configured with three Fast Ethernet modules to which we attached six HP-based Microsoft IIS web servers directly and a mix of up to 20 physical HP Vectra NT clients (simulating in excess of 500 virtual clients) both directly and via a Foundry FastIron Layer-3 switch. All connections were configured as full-duplex, Fast Ethernet connections with 3Com Fast Etherlink NICs used throughout. Finally we added live video streaming using MS Netshow Theater Server software and clients. In line with the limits of the NetShow software running on a single server with a 100Mbps connection, we were able to run a maximum of six video sessions across the network. In total we ran the CSS 11800 non-stop for a month in our labs. The only downtime came during very occasional enforced reboots when we were clearing down the configuration to start completely new tests. As part of the testing we attempted to attack the switch using both Chevin Softwares TCP/UDP traffic generator (which brought one vendors Layer 3 switch to a complete halt within a few seconds recently) and Socrates, which can be used to simulate DoS attacks. In both cases the switch was able to either withstand or deflect the traffic with no apparent hit on performance. Using the Wcat benchmarks we are able to test the CSS 11800s ability to switch on any content type. This included different files types (.txt, .gif etc) as well as cookie-based content. Regardless of the content type, the CSS 11800 was always able to recognise the request and switch accordingly based on whatever rule-sets we had in place. Our main requirement was to see how the CSS 11800 could be progressively fine-tuned. Starting with a basic Layer-3 set-up we then added Layer-3 round-robin load balancing,
then added sticky connection switching based on source IP address and finally added true Layer 5-7 content-based switching, based on .GIF content load balancing with the ACA algorithm loaded . In each case we ran a combination of the benchmark suites listed previously plus background IP traffic generators and email traffic and sought to optimise server access times and overall performance with each added configuration step. What was immediately impressive was the way the switch instantly responded to any configuration changes. Every time a rule was added, an algorithm changed or a server added into the equation, there was a beneficial result in terms of performance and load sharing capability. In the examples graphically illustrated, we see that as the load increases, so the efficiency of the ACA-based switching increases dramatically, shown here in terms of throughout and requests per second.
Bear in mind that this is the throughput generated from just one WB3.0 - of the many benchmarks being run simultaneously. So it does not represent anything like total throughput going through the switch but is a measure of relative server performance against a backdrop of constant network traffic coming from a number of different sources as in a real-life situation. Using the Socrates tool to test latency times, we achieved an overall reduction in latency in excess of 500% between a straight Layer 3 load-balancing set-up and content-based switching using Ciscos ACA algorithm, with a rule to redirect .GIF and .txt files to specific servers. This is even in excess of Ciscos own and intially seeming excessive - claim of up to 400% improvement and is truly impressive.
In the cache tests we ran a W-Cat benchmark which combined 25% ASP (Active Server Page) dynamic traffic with 75% static HTML traffic.� We combined this with static HTML traffic generated from ZDs Webbench 3.0 CGI content test. The aim was to get as close as possible to optimising the CacheFlow CF110 by seeing how much traffic got routed to it. Anything close to 80-85% would be optimal. The results were again excellent, as the graph across shows, with 87.5% of traffic directed to the CacheFlow.� To sum up, the Cisco CSS 11800 did everything we asked of it and proved, beyond any reasonable doubt, that the concept of web content switching does work in practice.� When you consider that this is still a relatively young product, its potential is enormous. We know of no other switch currently available which has the equivalent web switching capabilities, or the breadth of load sharing options available within the configuration options. It has the capacity to not simply load balance but to dynamically optimise the interaction between server and cache farms at all times, firewalls too. It therefore presents a very strong case for itself to any kind of service provider ASPs especially as well as any corporate with a heavily Intranet-biased network. The new additions, the CSS 11050 and the CSS 11150 give the Cisco product range a more complete feel so that any service provider with a mix of large and small data centres or POPs can now select the right product for the job in hand. NSS highly recommend the CSS 11800.
|
![]() |
Send mail to webmaster
with questions or�
|