A key issue in
distributed systems is providing a consistent view of shared
data throughout the application and doing it in an efficient
manner. The usual method has been to have processes query a
central database at the point in time when they need to access a
data item. Since the using process does not know if the data has
changed, the safe course is to read the latest value at every
use. This method is intolerably inefficient if the cost per
query is high, or the number of queries is high.
The subscription concept offers a
superior approach. The user of a data item opens a subscription
to the data item with a subscription server. As the data item is
updated, the user process receives asynchronous notification
from the subscription server. Now there is no polling - the user
process accurately tracks the item of interest.
The Datahub is a high performance
in-memory database, a subscription server, a configurable Tcl/Tk
interpreter, and a Distributed Message Hub (DMH) server. The
synergistic combination of these capabilities is the cornerstone
of a new architecture for distributed applications.
As a database server, the Datahub
provides the familiar programming model of relational database
tables using a subset of Structured Query Language (SQL). A
graphical user interface is available to display and manipulate
table data either remotely as a client, or as part of the
Datahub process when it is not running in the background.
As a subscription server, the
Datahub provides asynchronous notification to Client processes
whenever SQL table data that meets their selection criteria is
inserted, updated, or deleted. These notifications can be
standard SQL messages which are useful for data replication to
other Datahubs, or for data replication to commercially
available persistent databases, such as Oracle. Other
subscription options enable the developer to execute user
defined Tcl procedures within the Datahub process, or to obtain
notification messages in a format that is designed for use by
Tcl list manipulation commands.
When used as a DMH message
server, a Datahub becomes the hub of an efficient, event-driven,
distributed application. The Tcl interpreter provides a
high-level, dynamically customizable programming environment
with comprehensive features. Client processes can easily
exchange peer messages, can easily share common data in
relational tables, and can subscribe and respond in real-time to
changes in the application data. Shared application logic can be
executed within the Datahub process, sharing the same address
space with application data tables and SQL command processing
logic. The requirement for interprocess communication and the
overhead that it entails is reduced drastically compared to
other approaches.
Because the Datahub is both a
database and a subscription server, clients can open
"synchronized" subscriptions to existing table data when they
are initialized. A shortcoming of other notification mechanisms
that are based only on connecting to "broadcast" or
"distribution list" message streams, is that the client has no
historical data to synchronize with. It is an additional burden
on the application developer to design a synchronization
mechanism if dynamic client connections are supported.
Better Than Publish and
Subscribe
The products we have seen that
offer similar publish and subscribe functionality have certain
shortcomings that potential users should consider.
- The application developers
have to explicitly choose each data change or event that is
published. Typically this is not an easy choice because each
published item contributes to loading of the network
regardless of whether the event is subscribed to or not, and
much discussion has to take place among the developers to
make the appropriate tradeoffs.
- The content and format of
published messages has to be agreed upon by all parties
since these are not typically controlled by the subscriber.
Typically there is anguish over how much data to put in the
notification message, and what data is left at the server
for clients to query. Again, the developers sit around
weighing the tradeoffs.
- The subscribing clients have
to write custom parsing code for each message type since
there is typically not a Tcl or SQL parser in use.
- The traditional approach to
the specification of different message types is to declare a
union in C code of all of the binary message structures.
This technique is sensitive to cross platform issues such as
byte ordering and data alignment. Also, it is typical that
all of the executables on all platforms need to be
recompiled when a new message structure is defined or an
existing structure is updated since the included header file
is used extensively throughout the source code.
- Some products are based on
UDP broadcasting and will not work across subnets without
explicit configuration of the network routing and the
participating hosts.
In contrast, when the Datahub is
used, any data that is stored in the SQL tables can be
subscribed to by clients. There is no overhead of broadcasting
changes to the network just in case there is an interested
client. Similarly, there is not an issue with data propagation
across subnets since reliable point-to-point TCP/IP protocols
are used.
With the Datahub, the subscriber
has complete control over the content and format of data change
notifications. The software development proceeds without the
elaborate negotiation between the data producers and the data
consumers. For example, a client may open a subscription to a
data table where only certain column values are part of the
notification, and the notifications occur only for changes to
data rows or new data rows where a selection condition, such
sensor_id='VIP23' is true. If configured, the notification
can execute a user written Tcl procedure which can perform
actions such as sending custom formatted messages through the
network, or invoking an XML-RPC method on a remote server. |