In the first post of this series, we wrote about how most organizations today rely on networks of computers, all of which rely on clocks. If the clocks in these computers don’t agree with each other or reflect the correct time, it’s a bomb ticking away in the heart of IT infrastructure. And, in the second post of this series, I provided an overview of the five dangers and negative consequences of running out-of-sync computers in a network.
In this article, I will go into detail on a specific danger: Operational failure.
These failures cover a broad range of activities that touch almost every aspect of a company. Problems fall into three main areas:
• Automated tasks
• Network consolidated tasks
• Interdependent application tasks
Automated area tasks are those like data backups that run overnight. These are typically multi-stage events, each of which occurs (or should occur) at a scheduled time. If one event is triggered out-of-sequence with the others, the entire process may fail. Furthermore, since these tasks often occur at off-hours, the likelihood is that the failure may not be discovered or corrected until the next day.
Some network-based tasks save resources by allowing a single machine to perform a common service that might otherwise have to be performed on multiple machines. Others, like time synchronization, are inherently etworkcentric and are optimally performed on a common machine. Either way, the process represents a single point of failure. Directory services utilize a common time source to schedule the order in which events occur. Should the server with the central time source be out of sync with the clocks over which it has supervisory control, requests from their users and applications on those machines may not be recognized as valid.
A similar situation applies to distributed computing “middleware”. Middleware is the “glue” tying together processes running on multiple machines so that they behave as a single coherent application—for example, sales order administration where billing, point-of-sale, inventory control, and other systems all interoperate. In the case of IBM’s DCE, if any of the clocks running the various processes are more than five minutes out of sync with DCE’s Distributed Time Service, those processes will fail. That could be the billing system, a point-of-sale register, or virtually any other part of an infrastructure.
Computer events that require accurate, or accurately synchronized, time:
• Manufacturing process control
• Communications network time-of-day configuration
• Computer maintenance
• Funds transfers or purchases
• Database file time stamps (i.e. NFS, UNIX “make” process)
• Determining fault sequences via SNMP event traps
• Time stamping telephone and radio dispatch call records
• Employee time cards
• Measuring packet transit times
• Tracing intruder steps
• Time-dependent security processes (i.e. Kerberos authentication)
• Packet time-to-live stamps
Applications don’t have to be part of a distributed computing environment, however, to be interdependent. In fact, commercial trading partners probably would not want to be locked into sharing a middleware layer just so they could do business. And they don’t have to. There are many other ways applications can talk to each other—such as by using XML. Whatever the method, however, the demand for synchronicity remains high—such as when a parts manufacturer supplies “just in time” inventory to a carmaker. Each transaction, along with its various components, is time stamped, typically within a tolerance of one second. When a supplier’s bill of materials of price, for example, are not received by a customer’s waiting application within the required time window, the application may simply move on to process another transaction. This could occur, for example, if a time stamp mistakenly indicates that information was requested or arrived before it was sent. Another example is e-mail. If the user’s e-mail program is configured to display messages sorted by time sent, a message with the wrong time could be easily overlooked.
Get more information on Microsemi’s timing and synchronization solutions now.
In the next article in this series, I’ll go into detail on data loss and how operations can fail by losing data.
Leave a Reply
You must be logged in to post a comment.