Transactional information systems pdf download






















Guidelines for Teaching This book covers advanced material, including intensive use of formal models. So a solid background in computer science in general is assumed, but not necessarily familiarity with database systems. Whatever knowledge from that area is needed will be provided within the book itself. It is, in fact, one of our major points that transactional technology is important for many other areas, such as operating systems, workflow management, electronic commerce, and distributed objects, and should therefore be taught independently of database classes.

The book is primarily intended as a text for advanced undergraduate courses or graduate courses, but we would also encourage industrial researchers as well as system architects and developers who need an in-depth understanding of transactional information systems to work with this book.

After all, engineers should not be afraid of a little bit of mathematics. A possible, approximate breakdown of the material for this teaching time frame is given below. Preface xxiii Table P. Chapter 2: Computational Models 1. Under such time constraints it is obviously necessary to leave out some of the most advanced topics. Our subjective recommendations for a week course, with either four or two hours lecturing per week, are shown in Table P. Additional teaching materials, most notably, slides for lecturers and solutions to selected exercises are available at www.

We will also offer errata of the book as we discover our errors. Needless to say, all biases and possible errors in this book are our own. Our editor Diane Cerra and her colleague Belinda Breyer were perfect in their balance between keeping us relaxed and creative and occasionally putting some healthy pressure on us, and they were always responsive to our needs.

We wish everybody who is writing a book such a great editorial team. Last but not least we would like to thank our families for being with us while we were mentally somewhere else. If I had had more time, I could have written you a shorter letter. Moreover, it is rapidly gaining importance outside the context in which it was originally developed. In this introductory chapter, we discuss why transactions are a good idea, why transactions form a reasonable abstraction concept for certain classes of real-life data management and related problems, as well as what can and what cannot be done with the transaction concept.

The transaction concept was originally developed in the context of database management systems as a paradigm for dealing with concurrent accesses to a shared database and for handling failures. Therefore, we start out in Section 1. The original and most canonical application example is funds transfer in banking; very similar applications in terms of functionality and structure have arisen in a number of other service-providing industries, most notably in the travel industry with its flight, car, and hotel bookings.

All these classical application examples are commonly referred to as online transaction processing, or OLTP for short. In addition, we will show that the application area of the transaction concept includes modern business sectors such as electronic commerce and the management of workflows which are also known as business processes.

In terms of the underlying computer and network infrastructure, we are typically dealing with distributed systems of potentially large scale and with possibly heterogeneous, interoperating components. Transaction concept Computational models computers, portable notebooks, PDAs, electronic sensors, and other embedded systems. The key problem that the transaction concept solves in a very elegant way is to cope with the subtle and often difficult issues of keeping data consistent even in the presence of highly concurrent data accesses and despite all sorts of failures.

An additional key property of transactions is that this is achieved in a generic way that is essentially invisible to the application logic and to application development , so that application developers are completely freed from the burden of dealing with such system issues.

This is why transactions are an abstraction concept, and why this concept is a cornerstone of modern information technology. Section 1. In particular, we will identify components that are in charge of managing persistent data under a transaction-oriented access regime, and we will concentrate on these transactional data servers. We will then discuss, in Section 1. By far the most important concrete instantiation of a transactional data server is a database system. However, this is not a book about database systems.

We limit our discussion to topics that are directly and closely related to transactions, and nothing else. We will briefly survey the kind of knowledge we expect our readers to have about database systems in Section 1. This will prepare the setting for the introduction of two computational models for transactional servers in the next chapter. This chapter, like all subsequent chapters, is wrapped up by summarizing, in Section 1. The database contains, among others, a table named Account that describes bank accounts in terms of their account ID, associated customer name, identification of the respective bank branch, and balance.

Note the distinction between local variables of the invoked program and the data in the underlying database that is shared by all programs. In order to be able to ignore the potential fallacies of this concurrency, it is therefore desirable that each transaction be executed in an isolated manner, that is, as if there were no other transactions and hence no concurrency.

We will show that this tension between concurrency for the sake of performance, on the one hand, and potential sequential execution for the sake of simplicity and correctness, on the other, is reconciled by the concurrency control techniques of a transactional server. The following scenario illustrates that concurrency is indeed trickier than it may seem at first glance, and that it may have a disastrous impact on the consistency of the underlying data and thus the quality of the entire information system, even if each individual transaction is perfectly correct and preserves data consistency.

For simplicity, we ignore some syntactic details of the embedded SQL commands. The table below shows those parts of the two transactions that read and modify the account record. Thus, the recorded data no longer reflects reality and should be considered incorrect. Obviously, for such an information system to be meaningful and practically viable, this kind of anomaly must be prevented by all means.

Thus, concurrent executions must be treated with extreme care. Similar anomalies could arise from failures of processes or entire computers during the execution of a transaction, and need to be addressed as well. A second fundamentally important point is that the various accesses that a transaction has to perform need to occur in conjunction. This property of atomicity will turn out to be a crucial requirement on database transactions. Moreover, this conceptual property should be guaranteed to hold even in a failure-prone environment where individual processes or the entire database server may fail at an arbitrarily inconvenient point in time.

To this end, a transactional server provides recovery techniques to cope with failures. The following scenario illustrates that atomicity is a crucial requirement for being able to cope with failures. The program is described in terms of SQL statements embedded into a host program written in C. Thus, the target account will not receive the money, so that money is effectively lost in transit. A recovery procedure, to be invoked after the system is restarted, could try to find out which updates were already made by ongoing transaction program executions and which ones were not yet done, and could try to fix the situation in some way.

However, implementing such recovery procedures on a per-application-case basis is an extremely difficult task that is itself error prone by its mere complexity, especially because multiple transactions issued by different programs may have accessed the data at the time of the failure.

So rather than programming recovery in an ad hoc manner for each application separately, a systematic approach is needed. System-provided recovery that ensures the atomicity of transactions greatly simplifies the understanding of the postfailure state of the data and the overall failure handling on the application side.

In the example scenario, rather than being left with the inconsistent state in the middle of the transaction, the system recovery should restore the state as of before the transaction began. The above conceptual properties of a transaction—namely, atomicity, durability, and isolation—together provide the key abstraction that allows application developers to disregard concurrency and failures, yet the transactional server guarantees the consistency of the underlying data and ultimately the correctness of the application.

In the banking example, this means that no money is ever lost in the jungle of electronic funds transfers and customers can perfectly rely on electronic receipts, balance statements, and so on. As we will show in the next two application scenarios, these cornerstones for building highly dependable information systems can be successfully applied outside the scope of OLTP and classical database applications as well.

As a concrete example of such a modern setting, consider what happens when a client intends to purchase something from an Internet-based bookstore; such applications are known as electronic commerce e-Commerce. The purchasing activity proceeds in the following steps: 1. The client gradually fills an electronic shopping cart with items that she intends to purchase. When the client is about to check out, she reconsiders the items in her shopping cart and makes a final decision on which items she will purchase.

The client provides all the necessary information for placing a definitive and legally binding order. This includes her shipping address and information on her credit card or some other valid form of cybercash.

The latter information may be encrypted such that the merchant can only verify its authenticity, but possibly without being able to actually decrypt the provided data.

So why are transactions and their properties relevant for this scenario? It is obviously important to keep certain data consistent, and this data is even distributed across different computers. Note that this should be satisfied in the presence of temporary failures at the client or the server side e. Further note that this seemingly simple requirement may transitively involve additional data, say, on the inventory for the selected items, which could reside on yet another computer.

While it could be argued that data consistency is merely an optional luxury feature for the shopping cart contents and does not necessarily justify the use of advanced technology like transactions in the technical sense of this book, a very similar situation arises in the last step of the entire activity. There, it is 1.

At the same time, the clearinghouse must have a record on the payment, as its approval may be requested again later, or the clearinghouse may be responsible for the actual money transfer.

Finally, the client must have received the notification that the ordered items are being shipped. When these three effects on three different computers are known to be atomic, confidence in the correct processing of such e-Commerce activities is greatly increased. Conversely, when atomicity is not guaranteed, all sorts of complicated cases arise, such as the merchant shipping the items but the clearinghouse losing all records of its cybercash approval and ultimately not being able to reclaim the money.

Similarly, when the customer is never informed about the shipping and the resulting money transfer, she may order the items again from a different merchant, ending up with two copies of the same book. Even worse, the customer may receive the shipped items and keep them, but pretend that she never ordered and never received them. Similar, yet more involved arguments can be brought up about isolation properties, but the case for transactions should have been made sufficiently clear at this point.

Of course, we could deal with inconsistent data among the three computers of our scenario in many other ways as well. But the decisive point is that by implementing the last step of the activity as a transaction, all the arguments about atomicity in the presence of failures can be factored out, and the entire application is greatly simplified.

There are, however, a number of important differences as well, and these nicely highlight the potential generalization of the transaction concept beyond the classical setting of centralized database applications: The entire application is distributed across multiple computers, and the software may be heterogeneous in that different database systems are used at the various servers.

Of course, the hardware is likely to be heterogeneous, too, but this is mostly masked by the software and thus less relevant. The servers are not necessarily based on database systems; they may as well be some other form of information repository or document management servers in general.

The effects of a transaction may even include messages between computers, for example, the notification of the customer. It will take us to some of the advanced material, however, to cover all issues. A workflow is a set of activities or steps that belong together in order to achieve a certain business goal. Typical examples would be the processing of a credit request or insurance claim in a bank or insurance company, respectively; the work of a program committee for a scientific conference submissions, reviews, notifications, etc.

To orchestrate such processes, it is crucial to specify at least a template for the control flow and the data flow between activities, although it may still be necessary to improvise at run time e.

Activities can be completely automated or based on interaction with a human user and intellectual decision making. This implies that workflows can be long lived, up to several days or weeks, or even months and years.

A typical characteristic of workflows is that the activities are distributed across different responsible persons and different, independent information systems, possibly across different enterprises.

Thus, workflow management is essentially an umbrella for the activities and invoked applications that constitute a particular workflow. To this end, a workflow management system provides a specification environment for registering activities and for specifying, in a high-level declarative way, not only the control and data flow within a process, but also a run-time environment that automatically triggers activities according to the specified flow. Workflow management systems with such capabilities are commercially available and are gaining significant industrial relevance.

As a concrete example of a workflow, consider the activities that are necessary in the planning of a business trip, say, a trip to a conference. This involves the following activities: Select a conference, based on its subject, technical program, time, and place.

If no suitable conference is found, the process is terminated. Check out the cost of the trip to this conference, typically by delegation to a travel agency. Check out the registration fee for the conference, which often depends on your memberships, tutorials that you may wish to attend, and so on.

Compare the total cost of attending the selected conference to the allowed budget, and decide to attend the conference only if the cost is within the budget.

With the increasing costs of conferences and ever tighter travel budgets at the time this book was written , it is desirable to allow several trials with different conferences, but the number of trials should be limited, in order to guarantee termination of the entire process. The activities and the control flow between them are graphically depicted in Figure 1. This illustration is based on a specification formalism known as statecharts, which is one particular kind of formal specification method that might be used by a workflow management system.

Each oval denotes a state in which the workflow can exist during its execution. The activity may then invoke further application programs. When the workflow is started, a specified initial state a state without predecessors is entered, and the workflow terminates when a final state a state without successors is reached. In the example, the initial state is the SelectConference state, and the final states are Go and No.

The transitions between states are governed by event-condition-action rules that are attached to the transition arcs as labels. Then the current state is left and the state where the transition arc points to is entered; during this transition the specified action A is performed.

In the example, we only make use of conditions and actions. Both refer to a small set of variables instantiated for each workflow instance that are relevant for the control flow.

This kind of control flow specification allows conditional execution as well as loops based on highlevel predicates. The entire specification can be hierarchical, thus supporting both top-down refinement and bottom-up composition of existing building blocks, by allowing states to be nested. So a state can in turn contain another statechart that is executed when the state is entered.

In addition, the specification formalism allows parallel execution, which is graphically indicated by breaking a state down into two or more orthogonal statecharts, separated by a dashed line, that are executed in parallel. In the example, the activities that correspond to the two states CheckConfFee and CheckTravelCost are executed in parallel.

These two states are further refined into several steps, where CheckTravelCost again leads to two parallel substates. Although the example scenario is still largely oversimplified, the above discussion already indicates some of the semantically rich process design issues that accompany workflow management.

Here we are interested in the connection between workflows and transactions, and how a workflow application could possibly benefit from transaction-supporting services. The answer is threefold and involves different stages of transactional scope: The activities themselves can, of course, spawn requests to information systems that lead to transactional executions in these systems.

This is almost surely the case with the CheckTravelCost activity. In fact, it seems to make sense that this activity not only figures out the prices, but also makes reservations in the underlying information systems. Obviously, booking a flight to a certain city and a hotel room in that city makes sense only if both reservations are successful.

If either of the two is unavailable, the whole trip no longer makes sense. So these two steps need to be tied together in a single transaction. Note that this transaction is a distributed one that involves two autonomous information systems.

The requests against the various information systems would return status codes that should be stored in variables of the workflow and would be relevant for the future control flow.

For example, not being able to make one of the two necessary reservations in the selected city should trigger going back to the initial SelectConference state for another trial. To keep the example specification simple, this is not shown in Figure 1. In other words, the state of the workflow application should be under transactional control as well.

This is an entirely new aspect that did not arise in the banking and e-Commerce examples. But as we will show, transactional technology does provide solutions for incorporating application state into atomic processing units as well. We could discuss whether the entire travel planning workflow should be a single transaction that incorporates all effects on the underlying information systems as well as the state of the workflow application itself.

After all, the entire workflow should have an all-or-nothing, atomic effect. Ideas along these lines have indeed been discussed in the research community for quite a few years; however, no breakthrough is in sight. The difficulty lies in the long-lived nature of workflows and the fact that workflows, like simple transactions, run concurrently.

Regardless of the technical details of how isolation can be implemented at all to be covered in great depth in this book , maintaining such isolation over a period of hours, days, or weeks raises questions about performance problems with regard to the progress of concurrent workflows. For this reason, the straightforward approach of turning an entire workflow into a single transaction is absolutely infeasible. The discussion of the third item above does not imply, however, that the one-transaction-per-activity approach is the only kind of transactional support for workflows.

Consider the situation when all necessary reservations have been made successfully, but later it is found that the total cost including the conference fees is unacceptable and it is decided not to attend any conference at all. Now you hold reservations that may later result in charges to your credit card, unless you intervene.

So you must make sure that these reservations are canceled. One approach could be to extend the workflow specification by additional cancellation activities and the necessary control flow. So a better solution would be to generalize the particular case at hand into a more abstract notion of compensation activities.

Instead, the appropriate triggering of compensation activities could be delegated to the workflow management system. The transactional technology that we develop in this book does provide the principal means for coping with compensation issues in the outlined, generic way as opposed to developing specific solutions on a per-application basis over and over again.

At this point in the book, the major insight from this discussion is to realize that the scope of transactions is not a priori fixed and limited to stored data, but can be carefully extended to incorporate various aspects of information system applications as well.

In particular, we have seen that we need to separate clients— that is, the computers or terminals from which a human user generates computer work—from the servers where data and possibly executable programs reside in various forms. However, this distinction alone is insufficient for characterizing full-fledged modern information systems. We now introduce a more systematic view of these architectural issues in that we set up a reference architecture or framework to capture the most typical cases that are used in practice.

It captures what is most frequently referred to by practitioners as a three-tier architecture. Clients send business-oriented or goal-oriented requests to the application server. In modern applications, requests are typically issued from a GUI Graphical User Interface ; likewise, the reply that the application server will send back is often presented in a graphical way, using forms, buttons, charts, or even virtual reality—style animations.

All this presentation processing, for both input and output, is done by the client. Therefore, HTML Hypertext Markup Language , one of the original cornerstones of the World Wide Web, is a particularly attractive basis for presentation, because it merely requires that a Web browser is installed on the client side and thus applies to a large set of clients.

The application server has a repository of application programs in executable form, and invokes the proper program that is capable of handling a client request. Both the application programs and the entire application server can be organized in a variety of ways.

The programs themselves may be anything from an old-fashioned terminal-oriented COBOL program to a Java applet or some other program that generates, for example, a dynamic page. Application programs and business objects Request brokering Since the number of such programs that need to be provided for a fullfledged information system application can be very large, a good way of structuring this program repository is to organize the programs according to the object-oriented paradigm.

This means that abstract business objects, such as accounts, customers, or shopping catalogs, are provided as encapsulated objects, each with an abstract interface that consists of a number of methods that can be invoked on such an object. The invokable methods are exactly the application programs that refer to and are thus centered around a business object.

In programming language terms, a business object would be referred to as an abstract data type ADT. The implementation of these ADTs, as provided by the corresponding application programs, may itself invoke methods on other business objects and issue requests to other servers, particularly, data servers. The application server manages the entirety of application programs or business objects in that it spawns execution threads on behalf of client requests, monitors executions, handles certain generic forms of exceptions, and so on.

It thus constitutes the surrounding run-time system of the invoked programs. The functionality of this run-time system also usually includes management of the communication connections between the clients and the application server itself.

Often some form of session is established between a client and the server, possibly on top of a sessionless protocol such as HTTP Hypertext Transport Protocol of the World Wide Web, and the application server is again in charge of creating, monitoring, and terminating these sessions. So, in essence, the application server can be viewed as a request broker that establishes and maintains the proper connections between the client requests and the invoked business object methods or application programs, in more classical terminology.

Traditionally, so-called TP monitors Transaction Processing monitors have been the commercial incarnation of such request brokers, with specific support for OLTP applications. It is not unlikely that in a short time only one blend of request broker will exist and be worth remembering. The proliferation of the XML Extensible Markup Language data exchange standard, in particular, will be a major force toward unified and also simpler protocols for 1.

For exactly this reason we refer to a request brokering application server only as an abstract notion throughout this book. This notion would also include a complete workflow management system that has its own integrated request broker or is coupled with an ORB or TP monitor. As mentioned above, the implementation of a business object method often involves issuing requests to other business objects or other servers.

Rather, such persistent data is better kept on a data server that is specifically geared for the tasks of reliable longterm maintenance of data.

Database systems are the most prominent type of systems that fall into this category, but for certain types of data, other systems may be even more appropriate. For example, for semistructured text or multimedia documents and for electronic mail, specific products have a very successful usage record. Thus, document servers and mail servers are other important types of data servers. All these data servers may also provide some notion of encapsulated objects, as opposed to exposing the raw data at their interface.

However, this is an option that does not need to be exercised. For example, the data of a database server may be accessible to the application programs of an application server via the standardized query language SQL Structured Query Language in a direct way or via stored procedures, user-defined ADT functions, or other forms of encapsulated interfaces in modern object relational or fully object-oriented database products. Most of the elements in the above discussion are illustrated in Figure 1.

In addition, the figure implicitly includes a number of practically important specializations that are obtained by collapsing two tiers into a single component. There are two major options for such simpler architectures: Combining the client and the application server tiers: This implies that the application programs reside on the clients, leading to an architecture that is often called a client-server system with fat clients.

Clients then communicate directly with a data server, for example, via SQL, often embedded in a standardized client-server high-level communication protocol such as ODBC Open Database Connectivity. A possible problem with this approach and one of the original motivations for the introduction of an explicit middle tier is that many data server products have traditionally lacked the capabilities for maintaining a very large number of sessions and execution threads, say, 10, concurrent sessions, which would not be that unusual for popular information providers on the Web.

However, not every application really needs such high degrees of concurrency, and many commercial servers have made tremendous progress in this direction anyway. Combining the application server and the data server tiers: This implies that the application programs reside in the data server, leading to an architecture that is known as a client-server system with thin clients.

For example, a database system that has rich object encapsulation capabilities could provide business object methods to which the client requests can be directly mapped. The potential drawback of this architectural approach is that the data server may become less scalable and more susceptible to being overloaded. Indeed, if this architecture were the starting point, then outsourcing the application processing load to a separate application server would be another motivation for a three-tier architecture.

The bottom line is that specialized two-tier architectures are and will continue to be practically relevant. To avoid becoming dependent on specific architectures, our approach in this book will be based on computational models to be introduced in Chapter 2 rather than two- or three-tier system architectures. This will allow us to abstract from the particularities of the specific architecture and derive results that are as general as possible. However, for concrete illustration purposes, it may occasionally be helpful to transfer general results back to a specific system architecture.

For this purpose, the two-tier architectures are typically easier to match with the abstract computational models, and we will therefore prefer the two-tier cases over the more general three-tier case whenever we need such an illustration. Instead, multiple instances of two- and three-tier architectures often exist simultaneously within an enterprise for the different business units and worldwide branches.

Adding the various liaisons with external business partners and the general trend to virtual enterprises increases the multitude of information services upon which modern business is based.

So we really have to cope with a highly distributed architecture consisting of a wide variety of application and data servers. Obviously, these servers can be highly heterogeneous in that they use different products, different interfaces, and even radically different design philosophies e.

So the full spectrum in this highly diverse information landscape is best characterized as a federated system architecture, with multiple servers working together on a per-businesscase basis. This most general kind of architecture, which is expected to become ubiquitous in our information society, is illustrated in Figure 1. Federated system architectures require a communication infrastructure that can cope well with the heterogeneity and different operation modes of the underlying application and data servers.

This infrastructure is often referred to as middleware, as its place is between computer operating systems and the application programming level, but it may as well be viewed as an extension of modern operating systems. The key purpose is to provide highlevel communication mechanisms between clients and servers and also between servers, reconciling programming comfort e.

That is, methods on remote objects can be invoked as if these were objects in the same local address space; all necessary communication, including the lookup of computer network addresses, is automatically plugged in by the middleware layer. To make this work in a highly heterogeneous software landscape, standardized interfaces are needed. In particular, both include request brokering facilities, a key feature of an application server.

Consequently, it is not possible to tell precisely where the middleware stops and the application server itself begins. But this fuzziness is only an issue in terms of implementation. The overall system architecture can still be viewed as if the application server and the middleware between servers and clients were clearly separated.

In fact, it is the intention of this section to condense the often confusing manifold of commercial system components into a sufficiently simple architectural model. However, once the work, prioritization and scheduling the work authorization step s is completed the underlying work often to be done.

Work Processing and Recording - Doing the E. Queue Management work and recording the results of the work. In order to decouple the work flow steps from one another it is usually necessary to add queues between steps. A good work 5. Work Routing - Based on the change in state flow engine should allow the users to monitor queues in order to the work, pass the work along to the next to shift processing resources as necessary.

Work Evaluation - On a periodic basis, F. Difficult Problems analyze the work. The information from this The work flow engine should be responsible for doing all of step is often the input to the Work Planning steps defined above except for Work Processing and phase Recording which is unique to each step in each business The work flow component of the system should allow the transaction.

However, there are three functions that look like [6] Hagg, S. Management Information Systems for the responsibilities of the work flow engine that often require Information Age Third Canadian Edition. Toronto: McGraw-Hill.

This breaks [7] McGraw-Hill Online n. Levi Strauss Case Study. Retrieved June 7th, the encapsulation of the business transactions. Specification Case Studies 2nd ed. New York: Prentice Data Hall. Retrieved 12 December It is difficult to write a generic authorization processor without detailed knowledge of each [10] Avram, C. Routing It is often possible to route a business transaction to one of [11] Domanski, D.

The routing rules often vary based on the current [12] Gibbs, M. Work Monitoring [14] International Business Machines Corporation , Common user When human beings are the business transaction processors at access panel design and user interaction, IBM. A Degree from Dhaka performed by two. When the two separate functions were University, Bangladesh, Presently Working as a performed by different people, the existence of two separate Lecturer, Department of Business Administration, transaction processing computer systems, operated at two Shanto Mariam University of Creative Technology, different terminals was not seen as a problem.

In the proposed Dhaka,Bangladesh. Recent developments MD. A Degree from Dhaka development of this new system. His 3. Related Papers Performance characteristics of a diesel engine with deccan hemp oil By karthik reddy. By Md. Nayem Khan. Easy - Download and start reading immediately. Flexible - Read on multiple operating systems and devices. Easily read eBooks on smart phones, computers, or any eBook readers, including Kindle.

We cannot process tax exempt orders online. If you wish to place a tax exempt order please contact us. Add to cart.



0コメント

  • 1000 / 1000