|
IT
Scan
May
27, 2004
Managing
archival data
NEW
DELHI -- In today's business environment, information
and data have become the most important corporate assets.
This has given a huge impetus to the storage solutions
market. Vendors in the storage market space have come
up with many innovative products and solutions that
help organizations store and manage corporate data.
However, with an exponential growth of data, it is becoming
difficult to store and manage archival data. It has
become all the more problematic because both structured
and unstructured data has become an integral part of
today's business.
The
traditional definition of structured data is that which
is organized by the well-defined structure provided
by databases. Database sizes are growing so fast that
it is impeding application performance, stretching backup
windows and artificially inflating the total cost of
operations. However, if we look at unstructured data,
the growth of unstructured data has far surpassed the
growth of structured data. This is virtually due to
the inherent nature of unstructured data. Unstructured
data typically comprises of documents, spreadsheets,
graphics, still and motion images and various other
formats. Going further, messages and e-mail can be classified
as semi-structured data as they can be used form making
a framework for further classifying unstructured data.
According to industry estimates, over 50 percent of
the data residing in data centers falls into these categories.
One
particular yet simple example that is troubling almost
everyone including CEOs and CFOs is the phenomenal growth
of e-mail. Users normally get messages such as "Mailbox
size limit exceeded." Adding to the problem are
new regulations, which are forcing corporations to retain
their e-mails for a specified period and to produce
them on demand. This is a difficult task given that
they have routinely been spread throughout an IT infrastructure
and subject to regular purging to limit the size of
e-mail stores.
In
the face of such a scenario, organisations have to resort
to techniques such as data life cycle management. This
is done by effectively managing all the data that is
considered to be a corporate asset, by matching availability
and retrieval time with the data's value which varies
throughout the data lifecycle. In adopting data life
cycle management techniques, organisations can elevate
the efficiency and responsiveness of the total storage
environment and utilize available capacity optimally.
While
it is fundamental that IT departments continue to ensure
capacity requirements are met for critical applications,
there is a further demand for more effectively managing
digital assets by moving them to a different class of
media based on their current value. The idea is to take
advantage of waning requirements for retrieval time
and availability by moving less valuable, less-likely
to be accessed data to less expensive storage. Doing
so necessitates greater intelligence for managing storage
devices and automatically moving data within the overall
storage environment from the time it is created until
its expiry.
Further,
since more information that is generated out of business
activities is outside the boundaries of structured bounds
and retrieval mechanisms. This arises the need to catalogue,
search and retrieve this unstructured information into
the storage environment itself. At the same time, solutions
must encompass varying classes of storage devices and
media arranged in tiers in order to balance the cost
of storing any particular data asset with its current
value from the time of creation to end-of-life.
Therefore,
the archival platform solution should be an ideal combination
of intelligent storage and an open and collaborative
approach to storage software. This combination can be
most effectively used with ISO's reference model for
open archival information systems (OAIS). It is a proven
foundation for archive systems, having served as the
underpinnings of some of the largest data archives in
existence. Following the OAIS foundation guidelines,
the storage environment should be able to deliver the
various functions in the OAIS model that are:
Preservation
planning: This involves understanding the business-specific
issues related to data and how that data value varies
over its useful lifetime. An appropriate mix of consulting
and technology is required to draw down the archival
policies, which form the basic framework of managing
and retrieving archived data based on their value. This
is the first step in implementing the OAIS model.
Produce:
This function involves the aspect of handling all
the data assets produced by any manner of industry or
activity.
Ingest:
With the data being produced, the ingest function prepares
the generated data to be prepared for storage and management
within the archive store. The actions in an ingest functions
include, creating a digital signature for uniquely identifying
the object, indexing it, and moving the metadata describing
it onto the metadata store. Metadata is information
about the data that is used in populating, maintaining,
and accessing both the descriptive information that
identifies the archive's holdings and the administrative
data used to manage the archive.
Data
management: Once the metadata is developed, data
management involves indexing the metadata so that it
is made searchable and can be retrieved whenever required.
A link from the metadata store is used to determine
where the data asset in maintained in the storage archive
infrastructure.
Archival
storage: This function stores, maintains and retrieves
data, manages the storage hierarchy including movement
based on changes in data value, and provides disaster
recovery capabilities. This function is further enhanced
if the archival storage solution allows seamless data
movement in a heterogeneous storage environment. Interoperability
is a key aspect in the archival storage function.
Administration:
Administration functions include, configuration management
of system hardware and software, system engineering
functions to monitor and improve archive operations,
updating of archival and HSM policies, and customer
support. While routine administration functions are
handled using various management tools, using services
support optimizes the overall operations of the archive
system.
Access
control: This function helps consumers find information,
limits access as required and delivers query responses
to consumers. This should also be combined with tamper
proof functionality. This can be achieved by locking
disk volumes as "read only." Further, it should
help in keeping the data retention setting intact within
tiers of a storage archive.
Consume:
Just as in the "produce" function earlier
in the model, consumption has to be tailored to the
intended use of the data assets. Often this involves
integrating the archival system to the application ordinarily
used to access the data. While it is important to provide
a general interface for archived data retrieval for
auditors and administrators, real value is added by
enabling the application and application user to continue
working as always. Their standard application interface
and access approach should not change whether data is
in primary, secondary or tertiary storage.
Conclusively,
the archival storage architecture should be based on
an open, ISO compliant architecture that implements
data lifecycle management as a complement to mainstream
storage and business continuity practices. This openness
allows enterprises to participate in an interoperable
environment where the right data is always available
at the right time, and there is no need for the special-purpose
storage management software and devices used by other
solutions.
|