Convergence Plus Logo


www Convergence Plus
 
Sections Online
Broadband
Broadcasting
Datacomm
Expert View

IT Scan

May 27, 2004
Managing archival data

NEW DELHI -- In today's business environment, information and data have become the most important corporate assets. This has given a huge impetus to the storage solutions market. Vendors in the storage market space have come up with many innovative products and solutions that help organizations store and manage corporate data. However, with an exponential growth of data, it is becoming difficult to store and manage archival data. It has become all the more problematic because both structured and unstructured data has become an integral part of today's business.

The traditional definition of structured data is that which is organized by the well-defined structure provided by databases. Database sizes are growing so fast that it is impeding application performance, stretching backup windows and artificially inflating the total cost of operations. However, if we look at unstructured data, the growth of unstructured data has far surpassed the growth of structured data. This is virtually due to the inherent nature of unstructured data. Unstructured data typically comprises of documents, spreadsheets, graphics, still and motion images and various other formats. Going further, messages and e-mail can be classified as semi-structured data as they can be used form making a framework for further classifying unstructured data. According to industry estimates, over 50 percent of the data residing in data centers falls into these categories.

One particular yet simple example that is troubling almost everyone including CEOs and CFOs is the phenomenal growth of e-mail. Users normally get messages such as "Mailbox size limit exceeded." Adding to the problem are new regulations, which are forcing corporations to retain their e-mails for a specified period and to produce them on demand. This is a difficult task given that they have routinely been spread throughout an IT infrastructure and subject to regular purging to limit the size of e-mail stores.

In the face of such a scenario, organisations have to resort to techniques such as data life cycle management. This is done by effectively managing all the data that is considered to be a corporate asset, by matching availability and retrieval time with the data's value which varies throughout the data lifecycle. In adopting data life cycle management techniques, organisations can elevate the efficiency and responsiveness of the total storage environment and utilize available capacity optimally.

While it is fundamental that IT departments continue to ensure capacity requirements are met for critical applications, there is a further demand for more effectively managing digital assets by moving them to a different class of media based on their current value. The idea is to take advantage of waning requirements for retrieval time and availability by moving less valuable, less-likely to be accessed data to less expensive storage. Doing so necessitates greater intelligence for managing storage devices and automatically moving data within the overall storage environment from the time it is created until its expiry.

Further, since more information that is generated out of business activities is outside the boundaries of structured bounds and retrieval mechanisms. This arises the need to catalogue, search and retrieve this unstructured information into the storage environment itself. At the same time, solutions must encompass varying classes of storage devices and media arranged in tiers in order to balance the cost of storing any particular data asset with its current value from the time of creation to end-of-life.

Therefore, the archival platform solution should be an ideal combination of intelligent storage and an open and collaborative approach to storage software. This combination can be most effectively used with ISO's reference model for open archival information systems (OAIS). It is a proven foundation for archive systems, having served as the underpinnings of some of the largest data archives in existence. Following the OAIS foundation guidelines, the storage environment should be able to deliver the various functions in the OAIS model that are:

Preservation planning: This involves understanding the business-specific issues related to data and how that data value varies over its useful lifetime. An appropriate mix of consulting and technology is required to draw down the archival policies, which form the basic framework of managing and retrieving archived data based on their value. This is the first step in implementing the OAIS model.

Produce: This function involves the aspect of handling all the data assets produced by any manner of industry or activity.

Ingest: With the data being produced, the ingest function prepares the generated data to be prepared for storage and management within the archive store. The actions in an ingest functions include, creating a digital signature for uniquely identifying the object, indexing it, and moving the metadata describing it onto the metadata store. Metadata is information about the data that is used in populating, maintaining, and accessing both the descriptive information that identifies the archive's holdings and the administrative data used to manage the archive.

Data management: Once the metadata is developed, data management involves indexing the metadata so that it is made searchable and can be retrieved whenever required. A link from the metadata store is used to determine where the data asset in maintained in the storage archive infrastructure.

Archival storage: This function stores, maintains and retrieves data, manages the storage hierarchy including movement based on changes in data value, and provides disaster recovery capabilities. This function is further enhanced if the archival storage solution allows seamless data movement in a heterogeneous storage environment. Interoperability is a key aspect in the archival storage function.

Administration: Administration functions include, configuration management of system hardware and software, system engineering functions to monitor and improve archive operations, updating of archival and HSM policies, and customer support. While routine administration functions are handled using various management tools, using services support optimizes the overall operations of the archive system.

Access control: This function helps consumers find information, limits access as required and delivers query responses to consumers. This should also be combined with tamper proof functionality. This can be achieved by locking disk volumes as "read only." Further, it should help in keeping the data retention setting intact within tiers of a storage archive.

Consume: Just as in the "produce" function earlier in the model, consumption has to be tailored to the intended use of the data assets. Often this involves integrating the archival system to the application ordinarily used to access the data. While it is important to provide a general interface for archived data retrieval for auditors and administrators, real value is added by enabling the application and application user to continue working as always. Their standard application interface and access approach should not change whether data is in primary, secondary or tertiary storage.

Conclusively, the archival storage architecture should be based on an open, ISO compliant architecture that implements data lifecycle management as a complement to mainstream storage and business continuity practices. This openness allows enterprises to participate in an interoperable environment where the right data is always available at the right time, and there is no need for the special-purpose storage management software and devices used by other solutions.










Sudhakar Rao, Technical Director, Data Lifecycle Management, Hitachi Data Systems.
Disclaimer: No content may be used from this site without the written permission of the authors, Convergence Plus, Comnet Publications Pvt. Ltd. and Exhibitions India Pvt. Ltd. The views expressed on this site are solely those of the authors and do not reflect those of Convergence Plus, Comnet Publishers Pvt. Ltd. and Exhibitions India Pvt. Ltd.