Skip to content

Tools - UTF-8 Encoding Utility Class

cToolsUtf8 - UTF-8 Encoding and Decoding

Overview

Provides conversion functionality between Unicode strings and UTF-8 byte arrays.


Basic Encoding and Decoding

Encode

Encodes Unicode string to UTF-8 byte array.

vb
Public Function Encode(ByVal UCS As String) As Byte()

Parameters:

ParameterTypeDescription
UCSStringUnicode string

Returns:

UTF-8 encoded byte array.

Example:

vb
Dim Utf8Bytes() As Byte
Utf8Bytes = VBMAN.ToolsUtf8.Encode("你好世界")

' View byte content
Dim i As Long
For i = LBound(Utf8Bytes) To UBound(Utf8Bytes)
    Debug.Print Hex(Utf8Bytes(i));
Next i
' Output: E4 BD A0 E5 A5 BD E4 B8 96 E7 95 8C

Decode

Decodes UTF-8 byte array to Unicode string.

vb
Public Function Decode(ByRef Utf() As Byte) As String

Parameters:

ParameterTypeDescription
UtfByte()UTF-8 byte array

Returns:

Decoded Unicode string.

Example:

vb
Dim Utf8Bytes() As Byte
Utf8Bytes = VBMAN.ToolsUtf8.Encode("你好世界")

Dim Text As String
Text = VBMAN.ToolsUtf8.Decode(Utf8Bytes)
Debug.Print Text  ' Output: 你好世界

DecodeToByteArray

Decodes UTF-8 byte array to Unicode byte array (WideChar).

vb
Public Function DecodeToByteArray(ByRef Utf() As Byte) As Byte()

Description:

  • Converts UTF-8 bytes to Unicode (UTF-16LE) byte array
  • Each character occupies 2 bytes

Example:

vb
Dim Utf8Bytes() As Byte
Dim UnicodeBytes() As Byte

Utf8Bytes = VBMAN.ToolsUtf8.Encode("Hello")
UnicodeBytes = VBMAN.ToolsUtf8.DecodeToByteArray(Utf8Bytes)

' UnicodeBytes now contains UTF-16LE encoded bytes
' "H" = 0x48 0x00, "e" = 0x65 0x00, ...

Encoding and Decoding with BOM

EncodeWithBom

Encodes string to UTF-8 byte array with BOM.

vb
Public Function EncodeWithBom(strIn As String) As Byte()

Description:

  • BOM (Byte Order Mark) is EF BB BF
  • Some Windows programs need BOM to recognize UTF-8 encoding

Example:

vb
Dim Utf8Bytes() As Byte
Utf8Bytes = VBMAN.ToolsUtf8.EncodeWithBom("你好世界")

' View byte content (first 3 bytes are BOM)
Dim i As Long
For i = LBound(Utf8Bytes) To UBound(Utf8Bytes)
    Debug.Print Hex(Utf8Bytes(i));
Next i
' Output: EF BB BF E4 BD A0 E5 A5 BD E4 B8 96 E7 95 8C
'       [  BOM  ] [        你好世界(UTF-8)        ]

DecodeWithBom

Decodes UTF-8 byte array with BOM to string.

vb
Public Function DecodeWithBom(ByVal varIn As Variant) As String

Parameters:

ParameterTypeDescription
varInVariantByte array or Variant containing byte array

Description:

  • Automatically detects and skips BOM
  • Supports data with or without BOM

Example:

vb
Dim Utf8Bytes() As Byte
Utf8Bytes = VBMAN.ToolsUtf8.EncodeWithBom("你好世界")

Dim Text As String
Text = VBMAN.ToolsUtf8.DecodeWithBom(Utf8Bytes)
Debug.Print Text  ' Output: 你好世界

' Can also decode data without BOM
Dim NoBomBytes() As Byte
NoBomBytes = VBMAN.ToolsUtf8.Encode("Hello")
Text = VBMAN.ToolsUtf8.DecodeWithBom(NoBomBytes)
Debug.Print Text  ' Output: Hello

Complete Example

vb
Private Sub Utf8Demo()
    Dim Original As String
    Dim Utf8Bytes() As Byte
    Dim Decoded As String
    
    Original = "VBMAN Framework v1.0"
    
    ' Encode to UTF-8
    Utf8Bytes = VBMAN.ToolsUtf8.Encode(Original)
    Debug.Print "UTF-8 byte count: " & (UBound(Utf8Bytes) + 1)
    
    ' Decode back to string
    Decoded = VBMAN.ToolsUtf8.Decode(Utf8Bytes)
    Debug.Print "Decoded result: " & Decoded
    
    ' Verify
    Debug.Print "Match: " & (Original = Decoded)
    
    ' ===== Operations with BOM =====
    
    Dim Utf8BytesWithBom() As Byte
    Utf8BytesWithBom = VBMAN.ToolsUtf8.EncodeWithBom(Original)
    Debug.Print "Bytes with BOM: " & (UBound(Utf8BytesWithBom) + 1)
    
    ' Decode data with BOM
    Decoded = VBMAN.ToolsUtf8.DecodeWithBom(Utf8BytesWithBom)
    Debug.Print "BOM decode result: " & Decoded
End Sub

Private Sub Utf8FileDemo()
    ' Save UTF-8 file (without BOM)
    Dim Text As String
    Text = "你好,世界!"
    
    Dim Utf8Bytes() As Byte
    Utf8Bytes = VBMAN.ToolsUtf8.Encode(Text)
    
    ' Save using cFileIO
    VBMAN.FileIO.SetBuffer(Utf8Bytes).SaveData "C:\\utf8_nobom.txt"
    
    ' Save UTF-8 file with BOM
    Dim Utf8BytesWithBom() As Byte
    Utf8BytesWithBom = VBMAN.ToolsUtf8.EncodeWithBom(Text)
    VBMAN.FileIO.SetBuffer(Utf8BytesWithBom).SaveData "C:\\utf8_bom.txt"
    
    ' Read and decode
    Dim ReadBytes() As Byte
    VBMAN.FileIO.OpenFile("C:\\utf8_nobom.txt").ReadData()
    ReadBytes = VBMAN.FileIO.ReturnBytes()
    
    Dim ReadText As String
    ReadText = VBMAN.ToolsUtf8.Decode(ReadBytes)
    Debug.Print "Read content: " & ReadText
End Sub

Use Cases

ScenarioExample
Network TransmissionConvert string to UTF-8 bytes for sending
File StorageSave text with UTF-8 encoding
Encryption/DecryptionConvert to UTF-8 bytes before encryption
Data ValidationCalculate hash of UTF-8 bytes
Cross-platform CompatibilityUse UTF-8 with BOM to ensure Windows programs correctly recognize

Encoding Comparison

Encoding MethodAdvantagesDisadvantages
EncodeStandard UTF-8, good compatibilitySome Windows programs may not recognize
EncodeWithBomWindows Notepad and other programs can correctly identifyAdditional 3-byte BOM overhead

Notes

  1. BOM Usage

    • Windows Notepad saves UTF-8 files with BOM
    • Some programs (like some Unix tools) may not recognize BOM
    • BOM is usually not recommended in web development
  2. Byte Order

    • UTF-8 has no byte order issues
    • UTF-16 has big-endian/little-endian distinction
  3. Memory Usage

    • Chinese characters: UTF-8 usually occupies 3 bytes, UTF-16 occupies 2 bytes
    • ASCII characters: UTF-8 occupies 1 byte, UTF-16 occupies 2 bytes

VB6 and LOGO copyright of Microsoft Corporation