Tools - UTF-8 Encoding Utility Class
cToolsUtf8 - UTF-8 Encoding and Decoding
Overview
Provides conversion functionality between Unicode strings and UTF-8 byte arrays.
Basic Encoding and Decoding
Encode
Encodes Unicode string to UTF-8 byte array.
vb
Public Function Encode(ByVal UCS As String) As Byte()Parameters:
| Parameter | Type | Description |
|---|---|---|
UCS | String | Unicode string |
Returns:
UTF-8 encoded byte array.
Example:
vb
Dim Utf8Bytes() As Byte
Utf8Bytes = VBMAN.ToolsUtf8.Encode("你好世界")
' View byte content
Dim i As Long
For i = LBound(Utf8Bytes) To UBound(Utf8Bytes)
Debug.Print Hex(Utf8Bytes(i));
Next i
' Output: E4 BD A0 E5 A5 BD E4 B8 96 E7 95 8CDecode
Decodes UTF-8 byte array to Unicode string.
vb
Public Function Decode(ByRef Utf() As Byte) As StringParameters:
| Parameter | Type | Description |
|---|---|---|
Utf | Byte() | UTF-8 byte array |
Returns:
Decoded Unicode string.
Example:
vb
Dim Utf8Bytes() As Byte
Utf8Bytes = VBMAN.ToolsUtf8.Encode("你好世界")
Dim Text As String
Text = VBMAN.ToolsUtf8.Decode(Utf8Bytes)
Debug.Print Text ' Output: 你好世界DecodeToByteArray
Decodes UTF-8 byte array to Unicode byte array (WideChar).
vb
Public Function DecodeToByteArray(ByRef Utf() As Byte) As Byte()Description:
- Converts UTF-8 bytes to Unicode (UTF-16LE) byte array
- Each character occupies 2 bytes
Example:
vb
Dim Utf8Bytes() As Byte
Dim UnicodeBytes() As Byte
Utf8Bytes = VBMAN.ToolsUtf8.Encode("Hello")
UnicodeBytes = VBMAN.ToolsUtf8.DecodeToByteArray(Utf8Bytes)
' UnicodeBytes now contains UTF-16LE encoded bytes
' "H" = 0x48 0x00, "e" = 0x65 0x00, ...Encoding and Decoding with BOM
EncodeWithBom
Encodes string to UTF-8 byte array with BOM.
vb
Public Function EncodeWithBom(strIn As String) As Byte()Description:
- BOM (Byte Order Mark) is
EF BB BF - Some Windows programs need BOM to recognize UTF-8 encoding
Example:
vb
Dim Utf8Bytes() As Byte
Utf8Bytes = VBMAN.ToolsUtf8.EncodeWithBom("你好世界")
' View byte content (first 3 bytes are BOM)
Dim i As Long
For i = LBound(Utf8Bytes) To UBound(Utf8Bytes)
Debug.Print Hex(Utf8Bytes(i));
Next i
' Output: EF BB BF E4 BD A0 E5 A5 BD E4 B8 96 E7 95 8C
' [ BOM ] [ 你好世界(UTF-8) ]DecodeWithBom
Decodes UTF-8 byte array with BOM to string.
vb
Public Function DecodeWithBom(ByVal varIn As Variant) As StringParameters:
| Parameter | Type | Description |
|---|---|---|
varIn | Variant | Byte array or Variant containing byte array |
Description:
- Automatically detects and skips BOM
- Supports data with or without BOM
Example:
vb
Dim Utf8Bytes() As Byte
Utf8Bytes = VBMAN.ToolsUtf8.EncodeWithBom("你好世界")
Dim Text As String
Text = VBMAN.ToolsUtf8.DecodeWithBom(Utf8Bytes)
Debug.Print Text ' Output: 你好世界
' Can also decode data without BOM
Dim NoBomBytes() As Byte
NoBomBytes = VBMAN.ToolsUtf8.Encode("Hello")
Text = VBMAN.ToolsUtf8.DecodeWithBom(NoBomBytes)
Debug.Print Text ' Output: HelloComplete Example
vb
Private Sub Utf8Demo()
Dim Original As String
Dim Utf8Bytes() As Byte
Dim Decoded As String
Original = "VBMAN Framework v1.0"
' Encode to UTF-8
Utf8Bytes = VBMAN.ToolsUtf8.Encode(Original)
Debug.Print "UTF-8 byte count: " & (UBound(Utf8Bytes) + 1)
' Decode back to string
Decoded = VBMAN.ToolsUtf8.Decode(Utf8Bytes)
Debug.Print "Decoded result: " & Decoded
' Verify
Debug.Print "Match: " & (Original = Decoded)
' ===== Operations with BOM =====
Dim Utf8BytesWithBom() As Byte
Utf8BytesWithBom = VBMAN.ToolsUtf8.EncodeWithBom(Original)
Debug.Print "Bytes with BOM: " & (UBound(Utf8BytesWithBom) + 1)
' Decode data with BOM
Decoded = VBMAN.ToolsUtf8.DecodeWithBom(Utf8BytesWithBom)
Debug.Print "BOM decode result: " & Decoded
End Sub
Private Sub Utf8FileDemo()
' Save UTF-8 file (without BOM)
Dim Text As String
Text = "你好,世界!"
Dim Utf8Bytes() As Byte
Utf8Bytes = VBMAN.ToolsUtf8.Encode(Text)
' Save using cFileIO
VBMAN.FileIO.SetBuffer(Utf8Bytes).SaveData "C:\\utf8_nobom.txt"
' Save UTF-8 file with BOM
Dim Utf8BytesWithBom() As Byte
Utf8BytesWithBom = VBMAN.ToolsUtf8.EncodeWithBom(Text)
VBMAN.FileIO.SetBuffer(Utf8BytesWithBom).SaveData "C:\\utf8_bom.txt"
' Read and decode
Dim ReadBytes() As Byte
VBMAN.FileIO.OpenFile("C:\\utf8_nobom.txt").ReadData()
ReadBytes = VBMAN.FileIO.ReturnBytes()
Dim ReadText As String
ReadText = VBMAN.ToolsUtf8.Decode(ReadBytes)
Debug.Print "Read content: " & ReadText
End SubUse Cases
| Scenario | Example |
|---|---|
| Network Transmission | Convert string to UTF-8 bytes for sending |
| File Storage | Save text with UTF-8 encoding |
| Encryption/Decryption | Convert to UTF-8 bytes before encryption |
| Data Validation | Calculate hash of UTF-8 bytes |
| Cross-platform Compatibility | Use UTF-8 with BOM to ensure Windows programs correctly recognize |
Encoding Comparison
| Encoding Method | Advantages | Disadvantages |
|---|---|---|
| Encode | Standard UTF-8, good compatibility | Some Windows programs may not recognize |
| EncodeWithBom | Windows Notepad and other programs can correctly identify | Additional 3-byte BOM overhead |
Notes
BOM Usage
- Windows Notepad saves UTF-8 files with BOM
- Some programs (like some Unix tools) may not recognize BOM
- BOM is usually not recommended in web development
Byte Order
- UTF-8 has no byte order issues
- UTF-16 has big-endian/little-endian distinction
Memory Usage
- Chinese characters: UTF-8 usually occupies 3 bytes, UTF-16 occupies 2 bytes
- ASCII characters: UTF-8 occupies 1 byte, UTF-16 occupies 2 bytes