Archive formats such as zip

The archive classes handle archive formats such as zip, tar, rar and cab.

Currently only the zip classes are included. There are the following classes:

ArchiveInput: Providing input streams.
ArchiveOutput: Providing ouput streams.
ArchiveEntry: Holds meta-data for an entry (e.g. filename, timestamp, etc.)

The classes are designed to handle archives on both seekable streams such as disk files, or non-seekable streams such as pipes and sockets (see Archives on non-seekable streams).

Creating an archive

Call ArchiveOutput.PutNextEntry() to create each new entry in the archive, then write the entry's data. Another call to PutNextEntry() closes the current entry and begins the next. For example:

    wx.ArchiveOutput out = new wx.ArchiveOutput("test.zip");
    StreamWriter txt=new StreamWriter(out.Out);
    out.PutNextEntry("entry1.txt");
    txt.WriteLine("Some text for entry1");
    out.PutNextEntry(new Uri("subdir/entry2.txt").LocalPath);
    txt.WriteLine("Some text for subdir/entry2.txt");

The name of each entry can be a full path, which makes it possible to store entries in subdirectories.

wx.ArchiveInput.GetNextEntry() returns a pointer to entry object containing the meta-data for the next entry in the archive (and gives away ownership). Reading from the input stream wx.ArchiveInput. In then returns the entry's data.

When there are no more entries, wx.ArchiveInput.GetNextEntry() returns null wx.ArchiveInput.In also gets null.

wx.ArchiveEntry entry;
wx.ArchiveInput in=new wx.ArchiveInput("test.zip");
for(entry = in.GetNextEntry();
    entry != null;
    entry = in.GetNextEntry())
{
   Console.Out.WriteLine(string.format("{0}\t{1}", entry.Name, entry.DateTime);
}

Modifying an archive

To modify an existing archive, write a new copy of the archive to a new file, making any necessary changes along the way and transferring any unchanged entries using wx.ArchiveOutput.CopyEntry(). For archive types which compress entry data, CopyEntry() is likely to be much more efficient than transferring the data using Stream.Read() and Stream.Write() since it will copy them without decompressing and recompressing them.

In general modifications are not possible without rewriting the archive, though it may be possible in some limited cases. Even then, rewriting the archive is usually a better choice since a failure can be handled without losing the whole archive.

For example to delete all entries matching the pattern "*.txt":

wx.ArchiveInput in=new wx.ArchiveInput("test.zip");
wx.ArchiveOutput out=new wx.ArchiveOutput("test-txt.zip");
out.CopyArchiveMetaData(in);
for(wx.ArchiveEntry entry=in.GetNextEntry();
    entry != null;
    entry=in.GetNextEntry)
{
    if (!entry.Name.EndsWith("*.txt"))
       if (!out.CopyEntry(entry, in))
            break;
}
// close the input stream by releasing the pointer to it, do this
// before closing the output stream so that the file can be replaced
in.Close();

// you can check for success as follows
bool success = out.Close();

Looking up an archive entry by name

To open just one entry in an archive, the most efficient way is to simply search for it linearly by calling wx.ArchiveInput.GetNextEntry() until the required entry is found. This works both for archives on seekable and non-seekable streams.

The format of filenames in the archive is likely to be different from the local filename format. For example zips and tars use unix style names, with forward slashes as the path separator, and absolute paths are not allowed. So if on Windows the file "C:\MYDIR\MYFILE.TXT" is stored, then when reading the entry back wx.ArchiveEntry.Name will return "MYDIR\MYFILE.TXT". The conversion into the internal format and back has lost some information.

So to avoid ambiguity when searching for an entry matching a local name, it is better to convert the local name to the archive's internal format and search for that:

    // open the zip
    wx.ArchiveInput in("test.zip");

    // convert the local name we are looking for into the internal format
    wxString name = in.GetInternalName(localname);

    // call GetNextEntry() until the required internal name is found
    for(wx.ArchiveEntry entry=in.GetNextEntry();
        entry != null && entry.InternalName != name;
        entry=in.GetNextEntry())
    {
    }

    if (entry != null)
    {
        // read the entry's data...
    }

To access several entries randomly, it is most efficient to transfer the entire catalogue of entries to a container such as a Hashtable then entries looked up by name can be opened using the wx.ArchiveInput.OpenEntry() method.

Hashtable catalogue=new Hashtable();
// open the zip
wx.ArchiveInput in("test.zip");
// load the zip catalog
for(wx.ArchiveEntry entry=in.GetNextEntry();
    entry != null;
    entry=in.GetNextEntry())
{
   catalogue.Add(entry.InternalName, entry);
}

// open an entry by name
if (catalogue.ContainsKey(in.GetInternalName(localname)))
{
    in.OpenEntry(catalogue[in.GetInternalName(localname)]);
    StreamReader reader=new StreamReader(in.In);
    // ... now read entry's data
}

To open more than one entry simultaneously you need more than one underlying stream on the same archive:

// opening another entry without closing the first requires another
// input stream for the same file
wxArchiveInput in2(_T("test.zip"));
if (catalogue.ContainsKey(in2.GetInternalName(localname)))
  zip2.OpenEntry(catalogue[in2.GetInternalName(localname))]);

Archives on non-seekable streams

In general, handling archives on non-seekable streams is done in the same way as for seekable streams, with a few caveats. The main limitation is that accessing entries randomly using OpenEntry() is not possible, the entries can only be accessed sequentially in the order they are stored within the archive. For each archive type, there will also be other limitations which will depend on the order the entries' meta-data is stored within the archive. These are not too difficult to deal with, and are outlined below.

PutNextEntry and the entry size

When writing archives, some archive formats store the entry size before the entry's data (tar has this limitation, zip doesn't). In this case the entry's size must be passed to wx.ArchiveOutput.PutNextEntry() or an error occurs. This is only an issue on non-seekable streams, since otherwise the archive output stream can seek back and fix up the header once the size of the entry is known. For generic programming, one way to handle this is to supply the size whenever it is known, and rely on the error message from the output stream when the operation is not supported.

GetNextEntry and the weak reference mechanism

Some archive formats do not store all an entry's meta-data before the entry's data (zip is an example). In this case, when reading from a non-seekable stream, GetNextEntry() can only return a partially populated wxArchiveEntry object - not all the fields are set. The input stream then keeps a weak reference to the entry object and updates it when more meta-data becomes available. A weak reference being one that does not prevent you from deleting the wxArchiveEntry object - the input stream only attempts to update it if it is still around. The documentation for each archive entry type gives the details of what meta-data becomes available and when. For generic programming, when the worst case must be assumed, you can rely on all the fields of wxArchiveEntry being fully populated when wx.ArchiveInput.GetNextEntry() returns, with the the following exceptions:

wx.ArchiveEntry.GetSize(): Guaranteed to be available after the entry has been read to Eof(), or CloseEntry() has been called
wx.ArchiveEntry.IsReadOnly(): Guaranteed to be available after the end of the archive has been reached, i.e. after GetNextEntry() returns NULL and Eof() is true

This mechanism allows wx.ArchiveOutput.CopyEntry() to always fully preserve entries' meta-data. No matter what order order the meta-data occurs within the archive, the input stream will always have read it before the output stream must write it.

ZIP archives as resource container

You may use ZIP archives as a resource system using class ZipResource.

Manual of the