File_Extractor 1.0.0
--------------------
Author  : Shay Green <gblargg@gmail.com>
Website : http://code.google.com/p/file-extractor/
License : GNU LGPL 2.1 or later for all except unrar
Language: C interface, C++ implementation


Contents
--------
* Overview
* Limitations
* Extracting file data
* Archive file type handling
* Using in multiple threads
* Error handling
* Solving problems
* Thanks


Overview
--------
File_Exactor (fex) allows you to write one version of file-opening code
that handles normal files and archives of files. It presents each as a
series of files that you can scan and optionally extract; a single file
is made to act like an archive of just one file, so your code doesn't
need to do anything special to handle it.

Basic steps for scanning and extracting from an archive:

	* Open an archive or normal file using fex_open().
	* Scanning/extraction loop:
		- Exit loop if fex_done() returns true.
		- Get current file's name with fex_name().
		- If more file information is needed, call fex_stat() first.
		- If extracting, use fex_data() or fex_read().
		- Go to next file in archive with fex_next().
	* Close archive and free memory with fex_close().

You can stop scanning an archive at any point, for example once you've
found the file you're looking for. If you need to go back to the first
file, call fex_rewind() at any time. Be sure to check error codes
returned by most functions.


Limitations
-----------
All archives:
* A file's checksum is verified only after ALL its data is extracted.
* Encryption, segmentation, files larger than 2GB, and other extra
features are not supported.

GZ archives:
* Only gzip archives of a single file are supported. If it has multiple
files, the reported size will be wrong. Multi-file gzip archives are
very rare.

ZIP archives:
* Supports files compressed using either deflation or store
(uncompressed). Other compression schemes like BZip2 and Deflate64 are
not supported.
* Archive must have a valid directory structure at the end.

RAR archives:
* Support for really old 1.x archives might not work. If you have some
of these old archives, send them to me so I can test them.

7-zip:
* Solid archives can currently use lots of memory when open.


Extracting file data
--------------------
A file's data can be extracted with one or more calls to fex_read(), as
you would read from a normal file. Use fex_tell() to find out how much
has already been read. Use this if you need the data read into your own
structure in memory.

File data can also be extracted to memory by the library with
fex_data(). The pointer returned is valid only until you go to another
file or close the archive, so this is only useful if you need to examine
or process the data immediately and not keep it around for later.
Archive extractors naturally keep a copy of the extracted data in memory
already for solid archive types (currently 7-zip and RAR), so this
function is optimized to avoid making a second copy of it in those
cases.

Use fex_size() to find the size of the extracted data. Remember that
fex_stat() or fex_data() must be called BEFORE calling fex_size().


Archive file type handling
--------------------------
By default, fex uses the filename extension and header to determine
archive type. If the filename extension is unrecognized or it lacks an
extension, fex examines the first few bytes of the file. If still
unrecognized, fex opens it as binary. Fex also checks for common archive
types that it doesn't support, so that it can reject as unsupported them
rather than unhelpfully opening them as binary.

Your file format might itself be an archive, for example your files end
in ".rsn" yet are normal RAR archives, or they end in ".vgz" and are
gzipped. This is why fex checks the headers of files with unknown
filename extensions, rather than treating them as binary or rejecting
them.

Type identification can be customized by using the various
identification functions and fex_open_type(). For example, you could
avoid the header check:

	fex_t* fex;
	fex_type_t type = fex_identify_extension( path );
	if ( type == NULL )
		error( "Unsupported archive type" );
	
	error( fex_open_type( &fex, path, type ) );
	
Note that you'll only get a NULL type for known archive type that fex
doesn't handle; you won't get it for your own files, for example
fex_identify_extension("myfile.foo") won't return NULL (unless for some
reason you've disabled binary file support).

Use fex_type_list() to get a list of the types fex supports, for example
to tell the user what archive types your program supports:

	const fex_type_t* t;
	for ( t = fex_type_list(); *t; t++ )
		printf( "%s\n", fex_type_name( *t ) );

To get the fex_type_t for a particular archive type, use
fex_identify_extension():

	fex_type_t zip_type = fex_identify_extension( ".zip" );
	if ( zip_type == NULL )
		error( "ZIP isn't supported" );

Be sure to check the result as shown, rather than assuming the library
supports a particular archive type. Use an extension of "" to get the
type for binary files:

	fex_type_t bin_type = fex_identify_extension( "" );
	if ( bin_type == NULL )
		error( "Binary files aren't supported?!?" );


Using in multiple threads
-------------------------
Fex supports multi-threaded programs. If only one thread at a time is
using the library, nothing special needs to be done. If more than one
thread is using the library, the following must be done:

* Call fex_init() from the main thread and ensure it completes before
any other threads use any fex functions. This initializes shared data
tables used by the extractors.

* For each archive opened, only access it from one thread at a time.
Different archives can be accessed from different threads without any
synchronization, since fex uses no global variables. If the same archive
must be accessed from multiple threads, all calls to any fex functions
must be in critical section(s).


Unicode file paths on Windows
-----------------------------
If using Windows and your program supports Unicode file paths, enable
BLARGG_UTF8_PATHS in blargg_config.h, and convert your wide-character
paths to UTF-8 before passing them to fex.h functions:

	/* Wide-character path that could have come from system */
	wchar_t wide_path [] = L"demo.zip";
	
	/* Convert from wide path and check for error */
	char* path = fex_wide_to_path( wide_path );
	if ( path == NULL )
		error( "Out of memory" );
	
	/* Use converted path for fex call */
	error( fex_open( &fex, path ) );
	
	/* Free memory used by path */
	fex_free_path( path );

The converted path can be used with any of the fex functions that take
paths, for example fex_identify_extension() or fex_has_extension().


Error handling
--------------
Most functions that can fail return fex_err_t, a pointer type. On
failure they return a pointer to an error object, and on success they
return NULL. Use fex_err_code() to get a conventional error code, or
fex_err_str() to get a string suitable for reporting to the user.

There are two basic approches that your code can use to handle library
errors. It can return errors, or report them and exit the function via
some other means.

Your code can return errors as the library does, using fex_err_t:

	#define RETURN_ERR( expr ) \
		do {\
			fex_err_t err = (expr);\
			if ( err != NULL )\
				return err;\
		} while ( 0 )
	
	fex_err_t my_func()
	{
		RETURN_ERR( fex_foo() );
		RETURN_ERR( fex_bar() );
		return NULL;
	}

If you have your own error codes, you can convert fex's errors to them:

	// error codes that differ from library's
	enum {
		my_ok             = 0,
		my_generic_error  = 123,
		my_out_of_memory  = 456,
		my_file_not_found = 789
		// ...
	};
	
	int convert_error( fex_err_t err )
	{
		switch ( fex_err_code( err ) )
		{
		case fex_ok:               return my_ok;
		case fex_err_generic:      return my_generic_error;
		case fex_err_memory:       return my_out_of_memory;
		case fex_err_file_missing: return my_file_not_found;
		// ...
		default:                   return my_generic_error;
		}
	}
	
	#define RETURN_ERR( expr ) \
		do {\
			fex_err_t err = (expr);\
			if ( err != NULL )\
				return convert_error( err );\
		} while ( 0 )
	
	int my_func()
	{
		RETURN_ERR( fex_foo() );
		RETURN_ERR( fex_bar() );
		return my_ok;
	}

The other approach is to pass all errors to an error handler function
that never returns if passed a non-success error value:

	// never returns if err != NULL
	void handle_error( fex_err_t err );
	
	void my_func()
	{
		handle_error( fex_foo() );
		handle_error( fex_bar() );
	}

handle_error() could print the error and exit the program:

	void handle_error( fex_err_t err )
	{
		if ( err != NULL )
		{
			const char* str = fex_err_str( err );
			printf( "Error: %s\n", str );
			exit( EXIT_FAILURE );
		}
	}

handle_error() could also throw a C++ exception (or equivalently in C,
longmp() back to a setjmp() done inside caller()):

	void handle_error( fex_err_t err )
	{
		switch ( fex_err_code( err ) )
		{
		case fex_ok:          return;
		case fex_err_memory:  throw std::bad_alloc();
		// ...
		case fex_err_generic:
		default:
			throw std::runtime_error( fex_err_str( err ) );
		}
	}
	
	void caller()
	{
		try
		{
			my_func();
		}
		catch ( const std::exception& e )
		{
			printf( "Error: %s\n", e.what() );
		}
	}


Solving problems
----------------
If you're having problems, try the following:

* Enable debugging support in your environment. This enables assertions
and other run-time checks. In particular, be sure NDEBUG isn't defined.

* Turn the compiler's optimizer is off. Sometimes an optimizer generates
bad code.

* If multiple threads are being used, ensure that only one at a time is
accessing a given set of objects from the library. This library is not
in general thread-safe, though independent objects can be used in
separate threads.

* If all else fails, see if the demo works.


Thanks
------
Thanks to Richard Bannister, Kode54, byuu, Cless, and DJRobX for testing
and giving feedback for the library. Thanks to the authors of zlib,
unrar, and 7-zip.

-- 
Shay Green <gblargg@gmail.com>