Skip Navigation Links
Skip Navigation LinksHome > ZipArchive > How to Use > Article
Creating Seekable Compressed Data
Applies To: Available in the Full Version only.

Introduction

  • The ZipArchive Library allows creating seekable compressed data. Such data is organized in blocks where some blocks are considered to be synchronization blocks. The decompression can then start from any of the synchronization blocks.
  • Only deflate compression method supports creating seekable data. The deflate method is used by default by the CZipArchive class (see CZipArchive::SetCompressionMethod()).
  • The seekable compressed data cannot be encrypted, because encryption is applied after data is compressed.
  • The seekable compressed data can be created in segmented archives.
  • The disadvantage of creating seekable compressed data is a degraded compression ratio, because each synchronization block starts compression with an empty dictionary (otherwise the decompression could not start there). You can however adjust the frequency of creating synchronization blocks to find a balance between a compression ratio and seeking granularity (see the next paragraph).

Enabling Seeking Feature in the ZipArchive Library

To use the seeking feature, you need to make sure that _ZIP_SEEK is defined in the file _features.h. It is disabled by default. Rebuild the ZipArchive Library and your application, if you modify this definition.

Necessary Code Setup

To use the seeking feature, it is needed to include the proper header and use the ZipArchiveLib namespace as provided in the sample code below.
Sample Code
// This header needs to be included.
#include "DeflateCompressor.h"
// Use the following namespace or prefix the classes with its name.
using namespace ZipArchiveLib;
All the following samples assume that the above declarations are already made.

Creating Seekable Data

To create seekable compressed data, you need to set the appropriate option for the ZipArchiveLib::CDeflateCompressor compressor before compressing a file. The option responsible for controlling creation of synchronization blocks is the
ZipArchiveLib::CDeflateCompressor::COptions::m_iSyncRatio member variable. It determines how often the synchronization blocks are created. See Compressing Data for some more information about setting compressors options.

Immediately after a file is compressed, you can retrieve the array of the offsets pairs that describe the location of synchronization blocks and corresponding offsets in uncompressed data. Use the ZipArchiveLib::CDeflateCompressor::GetOffsetsArray() method for that. You should save the array returned by this method to a buffer, because the next compression operation will invalidate this object (also the next decompression operation may invalidate it). You can use the CZipCompressor::COffsetsArray::Save() method for that. You will need this array when you will be performing seeking in the compressed data later.

Sample Code
CZipArchive zip;
zip.Open(_T("C:\\Temp\\archive.zip"), CZipArchive::zipCreate);
// Request frequent creating of the synchronization blocks
// by setting options for the deflate compressor.
CDeflateCompressor::COptions options;
options.m_iSyncRatio = 1;
zip.SetCompressionOptions(&options);
// Compress a large file.
zip.AddNewFile(_T("C:\\Temp\\file1.dat"), CZipCompressor::levelBest);
// Get the current compressor to access the location of synchronization blocks.
// We can make sure that the current compressor is really the deflate compressor.
const CZipCompressor::COptions* pOptions = zip.GetCurrentCompressor()->GetOptions();
ASSERT(pOptions && pOptions->GetType() == CZipCompressor::typeDeflate);
// However, if the compression method was not changed, it is not necessary.
// Access the offsets array just after compression finished.
const CDeflateCompressor* pCompressor = (CDeflateCompressor*)zip.GetCurrentCompressor();
CZipCompressor::COffsetsArray* pArray = pCompressor->GetOffsetsArray();
ASSERT(pArray);
// Save the offsets array for later use.
CZipAutoBuffer buffer1;
pArray->Save(buffer1);
// Add one more file - the settings set previously still apply...
zip.AddNewFile(_T("C:\\Temp\\file2.dat"), CZipCompressor::levelBest);
// ...and save the relevant offsets array in a different buffer.
CZipAutoBuffer buffer2;
((CDeflateCompressor*)zip.GetCurrentCompressor())->GetOffsetsArray()->Save(buffer2);
// We can request that further compression will not use synchronization blocks...
options.m_iSyncRatio = 0;
zip.SetCompressionOptions(&options);
// and perform some other operations
// ...
zip.Close();

Determining Statistics of the Compressed Data

To find the balance between the compression ratio and the frequency of creating the synchronization blocks, you can use the CZipCompressor::COffsetsArray::GetStatistics() method to gather information about block sizes. You can then adjust the
ZipArchiveLib::CDeflateCompressor::COptions::m_iSyncRatio value and see how the block sizes change with respect to the compression ratio
(see CZipFileHeader::GetCompressionRatio()).

Seeking in Compressed Data

To perform seeking in compressed data, you will need an offsets array (CZipCompressor::COffsetsArray) created during compression. You can load previously saved array with the CZipCompressor::COffsetsArray::Load() method.

Retrieve the desired offsets pair (CZipCompressor::COffsetsPair) from the array and use it as an argument to one of the CZipArchive::ExtractFile() methods.

The seeking operation causes CRC value to be ignored while decompressing data. It has the same effect as calling the CZipArchive::SetIgnoredConsistencyChecks() method with the CZipArchive::checkLocalCRC argument for the current file.

Sample Code
// ... continues the previous sample
zip.Open(_T("C:\\Temp\\archive.zip"));
// Load the offsets array from the buffer where it was saved before.
// This buffer (buffer1) applies to the first file in our archive.
CZipCompressor::COffsetsArray offsets1;
offsets1.Load(buffer1);
// Find the offset of our interest.
// Let's say it should start not later than at 10MB of uncompressed data.
CZipCompressor::COffsetsPair* pPair = offsets1.FindMax(10 * 1024 * 1024);
ASSERT(pPair);
// Extract the file starting from the found offset.
zip.ExtractFile(0, _T("C:\\Temp"), true, NULL, ZipPlatform::fomRegular, pPair);
zip.Close();

Multiple Seeking in Data

You can perform multiple seek and extract operations on a file that is opened for decompression. This is possible only using the advanced decompression method (see Extracting Data and Testing Archives for more information). To seek, use the CZipArchive::SeekInFile() method and then you can start decompressing a file with calls to the CZipArchive::ReadFile() method.
Sample Code
// ... continues the previous sample
zip.Open(_T("C:\\Temp\\archive.zip"));
// Load the offsets array from the buffer where it was saved before.
// This buffer (buffer2) applies to the second file in our archive.
CZipCompressor::COffsetsArray offsets2;
offsets2.Load(buffer2);
// Let's say we want to extract two fragments of our file
zip.OpenFile(1);
// Let's say the first fragment should start
// not earlier than at 100kB of uncompressed data...
CZipCompressor::COffsetsPair* pPair1 = offsets2.FindMin(100 * 1024);
ASSERT(pPair1);
// ... and the second one is the last fragment in the file.
CZipCompressor::COffsetsPair* pPair2 = offsets2.GetAt(offsets2.GetSize() - 1);
ASSERT(pPair2);
CZipAutoBuffer buffer;
buffer.Allocate(64 * 1024);
// Extract the first fragment
zip.SeekInFile(pPair1);
DWORD read = zip.ReadFile(buffer, buffer.GetSize());
// do something with the data, e.g. write to a file,
// the code is omitted for the clarity of the example
// ...
// Extract the second fragment
zip.SeekInFile(pPair2);
read = zip.ReadFile(buffer, buffer.GetSize());
// ... do something with the data,
// the code is again omitted for the clarity of the example
// close the file now
zip.CloseFile();
// we can perform other operations on the archive
// ...
zip.Close();

Preserving Offsets Array

The offsets array (CZipCompressor::COffsetsArray) created during compressing data is necessary when decompressing data, because it contains locations of synchronizations blocks and the decompression can start only from those blocks. You can preserve this array in multiple ways (e.g. as a file in archive or inside another file). One way is to store the array for a particular file in central extra data of this file. For more information about using extra data, please refer to Providing Custom Data: Extra Fields. It is recommended that you use the ZIP_EXTRA_ZARCH_SEEK identifier for extra data. To save an offsets array to a buffer or to load an array from a buffer, use the corresponding method:

When saving, the offsets array tries to use 4 bytes for offsets. However, when any of the offsets does not fit into 4 bytes then 8 bytes are automatically used for each of the offsets. When loading, the library automatically detects the number of bytes used previously during saving. To use 8 bytes for offsets, the ZipArchive Library must be compiled with the Zip64 support (see Zip64 Format: Crossing the Limits of File Sizes and Number of Files and Segments for more information about Zip64 support).

Sample Code
CZipArchive zip;
zip.Open(_T("C:\\Temp\\archive.zip"), CZipArchive::zipCreate);
// Request creating of the synchronization blocks.
CDeflateCompressor::COptions options;
options.m_iSyncRatio = 10;
zip.SetCompressionOptions(&options);
// Compress a file.
zip.AddNewFile(_T("C:\\Temp\\file1.dat"));
// Create extra data for the compressed file. As this is the only file in our archive,
// it will have the 0 index. It is recommended to use the ZIP_EXTRA_ZARCH_SEEK ID.
CZipExtraData* extra = zip[0]->m_aCentralExtraData.CreateNew(ZIP_EXTRA_ZARCH_SEEK);
// Save the seek information in extra data.
((CDeflateCompressor*)zip.GetCurrentCompressor())
->GetOffsetsArray()->Save(extra->m_data);
// Finish working with the archive.
zip.Close();
// ...
// Reopen the archive.
zip.Open(_T("C:\\Temp\\archive.zip"));
// Recover the seek information from extra data of the desired file.
ASSERT(zip.GetCount() == 1);
CZipExtraData* extraData = zip[0]->m_aCentralExtraData.Lookup(ZIP_EXTRA_ZARCH_SEEK);
ASSERT(extraData);
CZipCompressor::COffsetsArray offsets;
offsets.Load(extraData->m_data);
// We can now use the offsets array in extraction.
// Please refer to the previous examples.
// ...
zip.Close();

See Also API Links

Article ID: 0711101739
Back To Top Up