Unicode Support: Using Non-English Characters in Filenames, Comments and Passwords
        
        
            
		
			Introduction
		
			- You should use the Unicode features of the ZipArchive Library when the filenames,
				comments or passwords in the archives you use contain non-ASCII characters.
- Without the Unicode support, the strings in archives are stored under Windows using
				the following code page:
				
					- filenames - current system OEM code page (CP_OEMCP),
- comments, passwords - current system ANSI code page (CP_ACP).
 Under other platforms, all strings are stored using the current system's code page.
- To use the Unicode functionality under Windows, you should compile the library and
				your application for Unicode. Under systems that use Unicode UTF-8 as the default
				code page (like Linux and OS X), there are no special considerations needed.
				On other systems, the Unicode support is not available.
- When calling the CZipFileHeader::SetFileName() method,
				the current Unicode mode will be applied to the file being renamed. This will also
				affect the Unicode mode used for the file's comment.
			Using Unicode in Filenames and File Comments (Full Version Only)
		
			Introduction
		
			- This feature is compatible with WinZip Unicode support and allows creating cross-platform
				Unicode archives that are extractable by utilities provided with the system under
				Linux and OS X.
- To use this functionality, make sure _ZIP_UNICODE is
				defined in the _features.h file. Rebuild the ZipArchive
				Library and your application, if you modify this definition. 
			Usage
		Call the 
CZipArchive::SetUnicodeMode() method and pass
		
CZipArchive::umExtra or 
CZipArchive::umString
		as the parameter. You can also use a combination of these two parameters.
		
			- CZipArchive::umExtra will store Unicode information
				in extra headers. This will cause to use the extra headers for a filename or comment
				only when the string contains non-ASCII characters. This value is used by default
				under Windows.
- CZipArchive::umString will store filename and comment
				directly in Unicode and will set a special flag in the file header inside of the
				archive. Some utilities under Windows may display an invalid strings in this case.
				This value is used by default under Linux/OS X.
- To determine what Unicode mode is used by a file, use the CZipFileHeader::GetState() method.
			Preserving the Compatibility
		The ZipArchive Library correctly decompresses archives created under different systems
		without additional settings.
		
			- If you need an archive created under Windows to be extracted correctly by Linux
				utilities, set the archive compatibility to ZipCompatibility::zcUnix
				with the
 CZipArchive::SetSystemCompatibility() method. To make
				the archive readable also by Windows utilities, set additionally one of the Unicode
				modes. Not all Windows utilities support the Unicode modes.
- If you need an archive created under Windows to be extracted correctly by Mac OS
				X utilities, set the Unicode mode to CZipArchive::umString
				or use the same way as for the Linux platform. 
- If you need an archive created under Linux/OS X to be extracted correctly by
				WinZip under Windows there is no need to change anything as the CZipArchive::umString mode is set by default, but you may need to set
				CZipArchive::umExtra for other Windows utilities that
				do not support the CZipArchive::umString mode.
			Setting Unicode Password and Archive Comment (Windows Only)
		
			- You can set a code page to be used while setting a password with the
 CZipArchive::SetPassword() method.
- You can set a code page to be used while setting an archive global comment with
				the
 CZipArchive::SetGlobalComment() method.
- If your password or a comment contains non-ASCII characters and you intend to compress
				files under Windows and extract them under Linux/OS X or vice versa, set the
				appropriate code page to CP_UTF8.
			Setting Locale in STL Applications
		If your locale is different from English and you wish to use non-English characters
		in archives, you need to set your locale globally; 
setlocale() function
		is not sufficient in this case.
		
			- To set the global locale to be the same as your system locale use the function:
 std::locale::global(std::locale(""));
- To set the global locale to a particular value, use the function e.g. this way:
 std::locale::global(std::locale("German"));
- When you use Unicode, do not use _T() macro in the
				above calls.
- Remember about putting #include <locale>in your code.
		Remember to restore the global locale to the previous value (returned by 
std:locale::global
		) after processing (it may affect other parts of your application).
		
			Additional Considerations (Windows Only)
		
			Unicode Normalization
		When you decompress archives that store filenames using different Unicode Normalization
		than form C (used by Windows), you should define 
_ZIP_UNICODE_NORMALIZE
		in the 
_features.h file, because some software under Windows
		may be unable to open files with filenames in a different form. This will convert
		any other normalization form to form C. This is e.g. the case when extracting archives
		created under OS X (it uses form D).
		
			- Under Windows Vista and later you need to use the appropriate for your system Windows
				SDK and make sure that you compile for that platform (WINVER
				should be defined to be at least 0x600).
- Under Windows XP and Windows Server 2003, you need to download Microsoft Internationalized
				Domain Name (IDN) Mitigation APIs to use this functionality.
- Under Windows 95/98/Me this functionality is unsupported.
			Safe Windows API
		The Unicode version the library uses Windows API 
WideCharToMultiByte
		and 
MultiByteToWideChar functions to perform conversions from ANSI
		code page to OEM code page and vice versa. It takes four function's calls to perform
		one conversion. The alternative is to use the 
CharToOemBuffA and 
			OemToCharBuffA functions and it takes only one function call per conversion
		in that case. However, this functions are considered unsafe and banned by Microsoft.
		If you prefer using the fast solution with unsafe functions, comment out the 
_ZIP_SAFE_WINDOWS_API definition in the 
ZipPlatform_win.cpp
		file.
		
			Custom Unicode Handling (Windows Only)
					
			- This functionality is specific to the ZipArchive Library and external software will
				not be able to benefit from it.
- To use this functionality, make sure _ZIP_UNICODE_CUSTOM
				is defined in the _features.h file. Rebuild the ZipArchive
				Library and your application, if you modify this definition. You also need to set
				the Unicode mode with the CZipArchive::SetUnicodeMode()
				method to the CZipArchive::umCustom value.
- The ZipArchive Library will save the code pages used during compression and automatically
				use them during extraction. The code pages are saved in zip extra fields. See below for more information.
- Setting string store settings with one of the API
				calls does not affect existing files and comments.
- If you open an existing archive with intent to add new files to it and you want
				the new files to use the same string store settings as the existing files, then:
				
				Otherwise the library will use the default settings for the current system (ZipPlatform::GetSystemID()). 
- If you want to open an archive created with a previous version of the ZipArchive
				library or any program, that uses a different filename of comment encoding code
				pages than the standard ones, set the code pages before opening the archive. The
				library will use them while decoding filenames and comments. The settings will be
				ignored, if the archive contains extra fields with code pages created by the ZipArchive
				Library. In this case, code pages from extra fields will be used instead. Note,
				that these settings will be used during compression in the same archive (unless
				changed).
- When you close an archive, the string store settings are reset to its default values
				for the current system, just like with the CZipStringStoreSettings::Reset()
				method call. This way, if you open the next archive using the same CZipArchiveobject, its string store settings are not affected by the previous archive settings.
			Storing Unicode Filenames in a Zip Archive
		You may control the way the ZipArchive Library stores filenames in archives by adjusting
		the first parameter of the
		
		
CZipArchive::SetStringStoreSettings(UINT, bool) method.
		
			- If you plan that the archive will be extracted under Linux/OS X, set this parameter
				to the identifier of the code page used by the system under which you want to
				extract the archive. You may try setting it to CP_ACP, then the current
				system ANSI code page will be used - it will work correctly if the target platform
				uses the same code page as your system.
- If you use e.g. Japanese or Korean characters, you may set this parameter to 
				CP_UTF8. Unicode UTF-8 will be used.
- You can set the code page directly using its identifier. Be sure it is installed
				on your system and on the system you plan to extract the archive on.
- To restore the OEM encoding under Windows, set this parameter back to CP_OEMCP.
Sample Code
CZipMemFile emptyFile;
CZipArchive zip;
LPCTSTR zipFileName = _T("C:\\Temp\\test.zip");
zip.Open(zipFileName, CZipArchive::zipCreate);
zip.SetStringStoreSettings(CP_UTF8);
zip.AddNewFile(emptyFile, _T("\u0391\u03A9"));
zip.SetStringStoreSettings(1250);
zip.AddNewFile(emptyFile, _T("\u010D\u011B"));
zip.SetStringStoreSettings(CP_OEMCP);
zip.AddNewFile(emptyFile, _T("English characters only"));    
zip.Close();
zip.Open(zipFileName);
zip.ExtractFile(1, _T("C:\\Temp"));
zip.Close();
			Preserving Compatibility with the Standard Zip Format
		It is assumed that under Windows filenames are stored using the current system OEM
		code page (
CP_OEMCP). Hence external software will not be able to properly
		decode filenames if they are stored using a different code page. For this reason,
		the ZipArchive Library allows storing filenames encoded with a custom code page
		in extra fields. The filenames in the standard location (the central directory and
		local headers) are encoded using OEM code page. This way, external software will
		see a typically encoded filenames and the ZipArchive Library will know the original
		filenames while extraction.
		
		You should note that this method takes additional space needed for storing a filename
		in an extra field. 
		To store filenames in extra fields, set the second parameter of the
		
		
CZipArchive::SetStringStoreSettings(UINT, bool) method
		to 
true.
		
Sample Code
CZipMemFile emptyFile;
CZipArchive zip;
LPCTSTR zipFileName = _T("C:\\Temp\\test.zip");
zip.Open(zipFileName, CZipArchive::zipCreate);
zip.SetStringStoreSettings(1250, true);
zip.AddNewFile(emptyFile, _T("\u0104\u0118"));
zip.Close();
		
		You can specify a different code page for file comments, e.g. by modifying the object
		returned by the
		
		
CZipArchive::GetStringStoreSettings() method call.
		
		The comment code page settings does not affect the global comment. Use 
CZipArchive::SetGlobalComment() to use a different code page in this
		case.
		
Sample Code
CZipMemFile emptyFile;
CZipArchive zip;
LPCTSTR zipFileName = _T("C:\\Temp\\test.zip");
zip.Open(zipFileName, CZipArchive::zipCreate);
zip.AddNewFile(emptyFile, _T("empty file"));
zip.GetStringStoreSettings().m_uCommentCodePage = CP_UTF8;
LPCTSTR comment = _T("\u0104\u0118");
zip[0]->SetComment(comment);
zip.SetGlobalComment(comment);
zip.Close();    
zip.Open(zipFileName);
CZipFileHeader* info = zip.GetFileInfo(0);
CZipString result = info->GetComment();
zip.SetStringStoreSettings(info->GetStringStoreSettings());    
result = zip.GetGlobalComment();    
		
		The ZipArchive Library stores code page information and if requested, encoded filename,
		in extra fields in the central directory. The global format of the ZipArchive extra
		field is as follows:
		
		
			
				
				
					| Header ID | 2 | 0x5A4C | 
				
					| Data Size | 2 |  | 
				
					| Data | as specified by Data Size |  | 
			
		 
		
		The format of the 
Data field is as follows (not all sub-fields
		may be present):
		
		
		
			
				
				
					| Version | 1 | 0x01 | 
				
					| Flag | 1 | 1, 3, 4 | 
				
					| Filename Code Page | 4 |  | 
				
					| Encoded Filename | variable |  | 
				
					| Comment Code Page | 4 |  | 
			
		 
		
		The 
Flag field values have the following meaning:
		
		
			
				
				
					| 0 | 1 | the Filename Code Page field is present | 
				
					| 0 and 1 | 3 | the Encoded Filename field is present (and the Filename Code Page field must be present too)
 | 
				
					| 2 | 4 | the Comment Code Page field is present | 
			
		 
		
			See Also API Links