Practical Encode&Decode Examples: From Text to BinaryEncoding and decoding are fundamental processes in computing and communications. At its core, encoding converts data from one format into another for storage, transmission, or processing; decoding reverses that process to restore the original information. This article walks through practical examples of encoding text to binary and decoding binary back to text, explains common schemes, shows how to handle edge cases, and gives sample code you can run or adapt.
Why encode text to binary?
- Computers operate on binary (bits: 0 and 1). Converting text to binary is necessary for low-level storage, communication protocols, and many algorithms.
- Binary representations make it possible to apply compression, encryption, error detection/correction, and transmission techniques.
- Understanding encoding helps debug data interoperability problems and design systems that interoperate reliably.
Common text encodings
- ASCII: 7-bit encoding for basic English characters (0–127). Simple and still widely used for legacy systems.
- UTF-8: Variable-length encoding for Unicode. Uses 1–4 bytes per code point and is the dominant encoding on the web.
- UTF-16: Uses 2 or 4 bytes per code point; common in some platforms and programming languages.
- Base64: Encodes binary data into ASCII characters, useful for embedding binary in text mediums (emails, JSON).
Key fact: when converting text to binary, you must choose a character encoding (e.g., UTF-8 or ASCII) first — the binary depends on that choice.
Binary basics and formats
Binary can be represented in several ways:
- Bit string (e.g., 01001000 01100101 for “He” in ASCII)
- Byte-oriented (grouped in 8 bits)
- Big-endian vs little-endian (affects multi-byte order)
- Packed formats (bitfields) for compact storage
Example 1 — ASCII text to binary (simple English)
Take the string: “Hi”
- Choose ASCII.
- Get decimal codes: ‘H’ = 72, ‘i’ = 105.
- Convert to 8-bit binary:
- 72 → 01001000
- 105 → 01101001
- Combined binary: 01001000 01101001
Decoding reverses the steps: split into bytes, convert each byte to decimal, map decimal to ASCII characters.
Example 2 — UTF-8 text to binary (handling non-ASCII)
Take the string: “Привет” (Russian “Hello”).
- Choose UTF-8.
- UTF-8 uses multiple bytes per Cyrillic letter. For example, ‘П’ (U+041F) encodes as two bytes: D0 9F (hex) → 11010000 10011111 (binary).
- Each character converts to its UTF-8 byte sequence; then to binary bit strings.
Important: treating UTF-8 bytes as independent ASCII characters will corrupt non-ASCII text. Always decode using the same encoding used for encoding.
Example 3 — Text to binary via Base64 (safe ASCII transport)
Base64 turns arbitrary binary (including UTF-8 bytes) into printable ASCII.
String: “Hello!”
- Convert to UTF-8 bytes: 48 65 6C 6C 6F 21 (hex)
- Group bits into 6-bit chunks, map to Base64 alphabet → “SGVsbG8h”
- To decode: reverse mapping → bytes → interpret as UTF-8 → “Hello!”
Use Base64 when embedding binary data inside text protocols (SMTP, JSON, data URIs).
Example 4 — Packing multiple fields into bits (practical systems)
Suppose a sensor packet needs:
- 1 bit: status (0/1)
- 7 bits: sensor ID (0–127)
- 12 bits: reading (0–4095) Total: 20 bits — pack into 3 bytes (24 bits) with 4 unused bits.
Packing example (status=1, id=5, reading=300):
- status: 1
- id: 0000101
- reading: 000100101100 Concatenate: 1 0000101 000100101100 → pad to 24 bits → split into bytes for transmission.
Decoding: extract bit fields by masks and shifts (bitwise operations).
Example 5 — Error detection: add a parity bit
For each byte, add an extra bit so the total number of 1 bits is even (even parity). Example: byte 01001000 has two 1s — parity bit = 0. Parity helps detect single-bit errors.
Code examples
All multi-line code is in runnable form. Replace the sample strings as needed.
Python — ASCII/UTF-8 text to binary and back
# text_to_binary.py def text_to_binary(s, encoding='utf-8'): b = s.encode(encoding) return ' '.join(f'{byte:08b}' for byte in b) def binary_to_text(bit_string, encoding='utf-8'): bytes_list = bytes(int(b, 2) for b in bit_string.split()) return bytes_list.decode(encoding) if __name__ == '__main__': s = 'Hi Привет' bits = text_to_binary(s, 'utf-8') print('Binary:', bits) restored = binary_to_text(bits, 'utf-8') print('Restored:', restored)
JavaScript — UTF-8 bytes and Base64
// text_base64.js function toBase64(str) { const utf8Bytes = new TextEncoder().encode(str); // convert bytes to binary string then to base64 let binary = ''; utf8Bytes.forEach(b => binary += String.fromCharCode(b)); return btoa(binary); } function fromBase64(b64) { const binary = atob(b64); const bytes = new Uint8Array([...binary].map(ch => ch.charCodeAt(0))); return new TextDecoder().decode(bytes); } console.log(toBase64('Hello, 世界')); // SGVsbG8sIOS4lueVjA== console.log(fromBase64('SGVsbG8sIOS4lueVjA=='));
Bit-packing example in C-like pseudocode
// pack.c (illustrative) uint32_t pack_packet(bool status, uint8_t id, uint16_t reading) { // status: 1 bit (MSB), id: next 7 bits, reading: next 12 bits uint32_t pkt = 0; pkt |= (status ? 1u : 0u) << 19; pkt |= (uint32_t)(id & 0x7F) << 12; pkt |= (uint32_t)(reading & 0x0FFF); return pkt; // lower 20 bits used }
Edge cases and pitfalls
- Mismatched encoding/decoding (UTF-8 vs ASCII) corrupts text.
- Endianness affects multi-byte numeric values, not byte order within UTF-8.
- Partial bytes: streaming systems must buffer until full bytes are available.
- Invisible control characters or BOM (byte order mark) can cause unexpected behavior.
- Base64 increases size ~33% — not suitable where space is critical without reason.
Tools and libraries
- Python: bytes, codecs, base64, TextEncoder/TextDecoder in JS.
- Command line: iconv, xxd, base64.
- Browser: btoa/atob (with UTF-8 caveats), TextEncoder/TextDecoder.
Quick reference — common conversions
- ASCII: 1 char → 1 byte → 8 bits
- UTF-8: 1 char → 1–4 bytes → 8–32 bits
- Base64: every 3 bytes → 4 ASCII chars
Remember: choose encoding first (e.g., UTF-8); binary representation depends on it.
If you want, I can:
- Provide a step-by-step walkthrough for a specific string.
- Produce downloadable sample scripts for a language you prefer.
- Explain bitwise operations used in packing/unpacking.
Leave a Reply