331 lines
9.9 KiB
Plaintext
331 lines
9.9 KiB
Plaintext
File_Extractor 1.0.0
|
|
--------------------
|
|
Author : Shay Green <gblargg@gmail.com>
|
|
Website : http://code.google.com/p/file-extractor/
|
|
License : GNU LGPL 2.1 or later for all except unrar
|
|
Language: C interface, C++ implementation
|
|
|
|
|
|
Contents
|
|
--------
|
|
* Overview
|
|
* Limitations
|
|
* Extracting file data
|
|
* Archive file type handling
|
|
* Using in multiple threads
|
|
* Error handling
|
|
* Solving problems
|
|
* Thanks
|
|
|
|
|
|
Overview
|
|
--------
|
|
File_Exactor (fex) allows you to write one version of file-opening code
|
|
that handles normal files and archives of files. It presents each as a
|
|
series of files that you can scan and optionally extract; a single file
|
|
is made to act like an archive of just one file, so your code doesn't
|
|
need to do anything special to handle it.
|
|
|
|
Basic steps for scanning and extracting from an archive:
|
|
|
|
* Open an archive or normal file using fex_open().
|
|
* Scanning/extraction loop:
|
|
- Exit loop if fex_done() returns true.
|
|
- Get current file's name with fex_name().
|
|
- If more file information is needed, call fex_stat() first.
|
|
- If extracting, use fex_data() or fex_read().
|
|
- Go to next file in archive with fex_next().
|
|
* Close archive and free memory with fex_close().
|
|
|
|
You can stop scanning an archive at any point, for example once you've
|
|
found the file you're looking for. If you need to go back to the first
|
|
file, call fex_rewind() at any time. Be sure to check error codes
|
|
returned by most functions.
|
|
|
|
|
|
Limitations
|
|
-----------
|
|
All archives:
|
|
* A file's checksum is verified only after ALL its data is extracted.
|
|
* Encryption, segmentation, files larger than 2GB, and other extra
|
|
features are not supported.
|
|
|
|
GZ archives:
|
|
* Only gzip archives of a single file are supported. If it has multiple
|
|
files, the reported size will be wrong. Multi-file gzip archives are
|
|
very rare.
|
|
|
|
ZIP archives:
|
|
* Supports files compressed using either deflation or store
|
|
(uncompressed). Other compression schemes like BZip2 and Deflate64 are
|
|
not supported.
|
|
* Archive must have a valid directory structure at the end.
|
|
|
|
RAR archives:
|
|
* Support for really old 1.x archives might not work. If you have some
|
|
of these old archives, send them to me so I can test them.
|
|
|
|
7-zip:
|
|
* Solid archives can currently use lots of memory when open.
|
|
|
|
|
|
Extracting file data
|
|
--------------------
|
|
A file's data can be extracted with one or more calls to fex_read(), as
|
|
you would read from a normal file. Use fex_tell() to find out how much
|
|
has already been read. Use this if you need the data read into your own
|
|
structure in memory.
|
|
|
|
File data can also be extracted to memory by the library with
|
|
fex_data(). The pointer returned is valid only until you go to another
|
|
file or close the archive, so this is only useful if you need to examine
|
|
or process the data immediately and not keep it around for later.
|
|
Archive extractors naturally keep a copy of the extracted data in memory
|
|
already for solid archive types (currently 7-zip and RAR), so this
|
|
function is optimized to avoid making a second copy of it in those
|
|
cases.
|
|
|
|
Use fex_size() to find the size of the extracted data. Remember that
|
|
fex_stat() or fex_data() must be called BEFORE calling fex_size().
|
|
|
|
|
|
Archive file type handling
|
|
--------------------------
|
|
By default, fex uses the filename extension and header to determine
|
|
archive type. If the filename extension is unrecognized or it lacks an
|
|
extension, fex examines the first few bytes of the file. If still
|
|
unrecognized, fex opens it as binary. Fex also checks for common archive
|
|
types that it doesn't support, so that it can reject as unsupported them
|
|
rather than unhelpfully opening them as binary.
|
|
|
|
Your file format might itself be an archive, for example your files end
|
|
in ".rsn" yet are normal RAR archives, or they end in ".vgz" and are
|
|
gzipped. This is why fex checks the headers of files with unknown
|
|
filename extensions, rather than treating them as binary or rejecting
|
|
them.
|
|
|
|
Type identification can be customized by using the various
|
|
identification functions and fex_open_type(). For example, you could
|
|
avoid the header check:
|
|
|
|
fex_t* fex;
|
|
fex_type_t type = fex_identify_extension( path );
|
|
if ( type == NULL )
|
|
error( "Unsupported archive type" );
|
|
|
|
error( fex_open_type( &fex, path, type ) );
|
|
|
|
Note that you'll only get a NULL type for known archive type that fex
|
|
doesn't handle; you won't get it for your own files, for example
|
|
fex_identify_extension("myfile.foo") won't return NULL (unless for some
|
|
reason you've disabled binary file support).
|
|
|
|
Use fex_type_list() to get a list of the types fex supports, for example
|
|
to tell the user what archive types your program supports:
|
|
|
|
const fex_type_t* t;
|
|
for ( t = fex_type_list(); *t; t++ )
|
|
printf( "%s\n", fex_type_name( *t ) );
|
|
|
|
To get the fex_type_t for a particular archive type, use
|
|
fex_identify_extension():
|
|
|
|
fex_type_t zip_type = fex_identify_extension( ".zip" );
|
|
if ( zip_type == NULL )
|
|
error( "ZIP isn't supported" );
|
|
|
|
Be sure to check the result as shown, rather than assuming the library
|
|
supports a particular archive type. Use an extension of "" to get the
|
|
type for binary files:
|
|
|
|
fex_type_t bin_type = fex_identify_extension( "" );
|
|
if ( bin_type == NULL )
|
|
error( "Binary files aren't supported?!?" );
|
|
|
|
|
|
Using in multiple threads
|
|
-------------------------
|
|
Fex supports multi-threaded programs. If only one thread at a time is
|
|
using the library, nothing special needs to be done. If more than one
|
|
thread is using the library, the following must be done:
|
|
|
|
* Call fex_init() from the main thread and ensure it completes before
|
|
any other threads use any fex functions. This initializes shared data
|
|
tables used by the extractors.
|
|
|
|
* For each archive opened, only access it from one thread at a time.
|
|
Different archives can be accessed from different threads without any
|
|
synchronization, since fex uses no global variables. If the same archive
|
|
must be accessed from multiple threads, all calls to any fex functions
|
|
must be in critical section(s).
|
|
|
|
|
|
Unicode file paths on Windows
|
|
-----------------------------
|
|
If using Windows and your program supports Unicode file paths, enable
|
|
BLARGG_UTF8_PATHS in blargg_config.h, and convert your wide-character
|
|
paths to UTF-8 before passing them to fex.h functions:
|
|
|
|
/* Wide-character path that could have come from system */
|
|
wchar_t wide_path [] = L"demo.zip";
|
|
|
|
/* Convert from wide path and check for error */
|
|
char* path = fex_wide_to_path( wide_path );
|
|
if ( path == NULL )
|
|
error( "Out of memory" );
|
|
|
|
/* Use converted path for fex call */
|
|
error( fex_open( &fex, path ) );
|
|
|
|
/* Free memory used by path */
|
|
fex_free_path( path );
|
|
|
|
The converted path can be used with any of the fex functions that take
|
|
paths, for example fex_identify_extension() or fex_has_extension().
|
|
|
|
|
|
Error handling
|
|
--------------
|
|
Most functions that can fail return fex_err_t, a pointer type. On
|
|
failure they return a pointer to an error object, and on success they
|
|
return NULL. Use fex_err_code() to get a conventional error code, or
|
|
fex_err_str() to get a string suitable for reporting to the user.
|
|
|
|
There are two basic approches that your code can use to handle library
|
|
errors. It can return errors, or report them and exit the function via
|
|
some other means.
|
|
|
|
Your code can return errors as the library does, using fex_err_t:
|
|
|
|
#define RETURN_ERR( expr ) \
|
|
do {\
|
|
fex_err_t err = (expr);\
|
|
if ( err != NULL )\
|
|
return err;\
|
|
} while ( 0 )
|
|
|
|
fex_err_t my_func()
|
|
{
|
|
RETURN_ERR( fex_foo() );
|
|
RETURN_ERR( fex_bar() );
|
|
return NULL;
|
|
}
|
|
|
|
If you have your own error codes, you can convert fex's errors to them:
|
|
|
|
// error codes that differ from library's
|
|
enum {
|
|
my_ok = 0,
|
|
my_generic_error = 123,
|
|
my_out_of_memory = 456,
|
|
my_file_not_found = 789
|
|
// ...
|
|
};
|
|
|
|
int convert_error( fex_err_t err )
|
|
{
|
|
switch ( fex_err_code( err ) )
|
|
{
|
|
case fex_ok: return my_ok;
|
|
case fex_err_generic: return my_generic_error;
|
|
case fex_err_memory: return my_out_of_memory;
|
|
case fex_err_file_missing: return my_file_not_found;
|
|
// ...
|
|
default: return my_generic_error;
|
|
}
|
|
}
|
|
|
|
#define RETURN_ERR( expr ) \
|
|
do {\
|
|
fex_err_t err = (expr);\
|
|
if ( err != NULL )\
|
|
return convert_error( err );\
|
|
} while ( 0 )
|
|
|
|
int my_func()
|
|
{
|
|
RETURN_ERR( fex_foo() );
|
|
RETURN_ERR( fex_bar() );
|
|
return my_ok;
|
|
}
|
|
|
|
The other approach is to pass all errors to an error handler function
|
|
that never returns if passed a non-success error value:
|
|
|
|
// never returns if err != NULL
|
|
void handle_error( fex_err_t err );
|
|
|
|
void my_func()
|
|
{
|
|
handle_error( fex_foo() );
|
|
handle_error( fex_bar() );
|
|
}
|
|
|
|
handle_error() could print the error and exit the program:
|
|
|
|
void handle_error( fex_err_t err )
|
|
{
|
|
if ( err != NULL )
|
|
{
|
|
const char* str = fex_err_str( err );
|
|
printf( "Error: %s\n", str );
|
|
exit( EXIT_FAILURE );
|
|
}
|
|
}
|
|
|
|
handle_error() could also throw a C++ exception (or equivalently in C,
|
|
longmp() back to a setjmp() done inside caller()):
|
|
|
|
void handle_error( fex_err_t err )
|
|
{
|
|
switch ( fex_err_code( err ) )
|
|
{
|
|
case fex_ok: return;
|
|
case fex_err_memory: throw std::bad_alloc();
|
|
// ...
|
|
case fex_err_generic:
|
|
default:
|
|
throw std::runtime_error( fex_err_str( err ) );
|
|
}
|
|
}
|
|
|
|
void caller()
|
|
{
|
|
try
|
|
{
|
|
my_func();
|
|
}
|
|
catch ( const std::exception& e )
|
|
{
|
|
printf( "Error: %s\n", e.what() );
|
|
}
|
|
}
|
|
|
|
|
|
Solving problems
|
|
----------------
|
|
If you're having problems, try the following:
|
|
|
|
* Enable debugging support in your environment. This enables assertions
|
|
and other run-time checks. In particular, be sure NDEBUG isn't defined.
|
|
|
|
* Turn the compiler's optimizer is off. Sometimes an optimizer generates
|
|
bad code.
|
|
|
|
* If multiple threads are being used, ensure that only one at a time is
|
|
accessing a given set of objects from the library. This library is not
|
|
in general thread-safe, though independent objects can be used in
|
|
separate threads.
|
|
|
|
* If all else fails, see if the demo works.
|
|
|
|
|
|
Thanks
|
|
------
|
|
Thanks to Richard Bannister, Kode54, byuu, Cless, and DJRobX for testing
|
|
and giving feedback for the library. Thanks to the authors of zlib,
|
|
unrar, and 7-zip.
|
|
|
|
--
|
|
Shay Green <gblargg@gmail.com>
|