Base64 Encoding Explained
Published · 8 min read
Base64 shows up everywhere — in JWT tokens, in src="data:..." image attributes, in HTTP
Basic authentication headers, in email attachments. It looks like gibberish, and that is exactly the
point: Base64 re-encodes arbitrary binary data using only 64 printable ASCII characters, so it can travel
through systems that were built for text. This guide explains what is actually happening when you encode
something, why the output is always about a third larger than the input, and the one misconception about
Base64 that gets people into trouble.
The core idea
A byte is 8 bits, which gives 256 possible values. Many of those values are control characters: newlines, null bytes, the ASCII bell. Protocols designed in the 1980s — SMTP for email, HTTP headers, many databases — treat some of those control bytes as special. If you try to send a raw JPEG through SMTP, the mail transfer agent interprets the first byte that happens to equal a line feed as “end of line,” and your image is corrupted before it arrives.
Base64 sidesteps the problem. It takes the input bytes, regroups them into chunks of 6 bits (because 26 = 64), and maps each 6-bit chunk to one of 64 safe printable characters. Because every output character is plain ASCII, the encoded text can flow through any text-only channel without being mangled.
Three input bytes (24 bits) become exactly four output characters (4 × 6 = 24 bits). That 3-to-4 ratio is the whole reason Base64 is bigger than the original — more on that in a moment.
The alphabet
The 64 characters are defined by RFC 4648:
Value Char Value Char Value Char Value Char
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
... ... ... ...
25 Z 42 q 59 7
26 a 43 r 60 8
27 b 44 s 61 9
28 c 45 t 62 + (or - in base64url)
29 d 46 u 63 / (or _ in base64url)
...
50 y 50 y (padding =)
A–Z, a–z, 0–9 give 62 characters.
The last two are + and / in standard Base64. The variant called
base64url — used in JWTs and filenames — replaces those with - and
_ so the output is safe in URLs and file paths.
If the input length is not a multiple of three, padding with = characters brings the output
length up to a multiple of four. The string "Man" encodes to TWFu; the string
"Ma" encodes to TWE=; the single character "M" encodes to
TQ==. The number of trailing = signs tells the decoder exactly how many bytes
to drop from the final group.
Why the output is ~33% bigger
Three bytes of input become four characters of output. Four thirds is roughly 1.33, so the encoded form is about 33% larger than the raw bytes. Encode a 300 KB PNG and you ship ~400 KB. Add the padding and newline wrapping that some implementations insert every 76 characters, and the overhead creeps higher.
This matters in practice. Embedding images as Base64 in CSS removes an HTTP request, but it forces the browser to download a third again as many bytes and blocks rendering until the (larger) CSS parses. For small icons under a few kilobytes the trade-off usually wins. For anything larger, ship the file as a separate asset and let the browser cache it.
gzip and Brotli partly claw the overhead back — Base64 text compresses well because it uses so few distinct characters — but compression happens at the transport layer, and the decompressed payload still has to fit in memory.
Text, UTF-8, and the encoding pipeline
People say “encode this string to Base64,” but there is an invisible first step. Base64 operates on bytes, not on characters. To Base64-encode a string you first convert the string to bytes using a character encoding — almost always UTF-8 — and then Base64-encode the bytes.
// JavaScript
const text = "안녕"; // 2 Korean characters
const bytes = new TextEncoder().encode(text); // 6 bytes (UTF-8: 3 per char)
const b64 = btoa(String.fromCharCode(...bytes)); // 7Z2E7JyF
const back = new TextDecoder().decode(
Uint8Array.from(atob(b64), c => c.charCodeAt(0))
); // "안녕" again
The classic mistake is calling btoa("안녕") directly. btoa expects a
Latin-1 string and throws InvalidCharacterError on any character above code point 255. The
fix is always the same: encode to UTF-8 bytes first, then Base64. Conversely, when you decode, you get
bytes back; you must then interpret those bytes as UTF-8 to recover the original text. Skipping that
last step is how you end up with where your emoji used to be.
Where you will actually use it
-
Data URIs in HTML and CSS.
<img src="data:image/png;base64,iVBORw0KG...">inlines an image directly in the document. Handy for tiny icons and sprites; a bad idea for large images because of the size tax and the loss of caching. The Image to Base64 tool produces these strings from a drag-and-dropped file. -
HTTP Basic authentication.
The
Authorization: Basicheader sendsusername:passwordBase64-encoded. This is not security — it is reversible by anyone who sees the header — so Basic auth must always travel over HTTPS. - JWT tokens. The three dot-separated parts of a JWT are each Base64url-encoded JSON. The signature part is a Base64url encoding of the raw HMAC bytes. You can decode and read a JWT with a JWT decoder in seconds — which is exactly why you should never put secrets in a JWT payload.
- Email MIME attachments. SMTP was originally text-only; MIME wraps binary attachments in Base64 so they survive transit. This is the historical reason the format exists at all.
- Embedding binary in JSON or YAML. When you need to ship a small binary blob inside a text format, Base64 is the standard escape hatch.
Base64 is not encryption
This is the misconception that causes real harm. Base64 is reversible by anyone, instantly, with no key.
Encoding a password or an API token in Base64 provides zero confidentiality. It is
obfuscation at best, and often not even that — a Base64 string is so recognizable
(== padding, the limited alphabet) that scanners flag it automatically.
If you need to keep data secret, use encryption (AES-GCM for symmetric, RSA or ECDSA for asymmetric). If you need to verify data has not been tampered with, use a cryptographic hash or HMAC. If you need to transmit binary safely through a text channel, then Base64 is the right tool. Mixing up these three jobs is how “encrypted” API keys end up readable in a GitHub commit.
Variants and line wrapping
There are a few flavors of Base64 in the wild, and mixing them up is a common bug. Standard
Base64 uses + and / and pads with =.
Base64url swaps in - and _ so the output is safe inside a
URL path or query string without further percent-encoding — this is the variant JWT uses. Some
base64url implementations also drop the trailing = padding entirely and infer it from the
string length, which is fine until you feed the unpadded string into a strict decoder that expects the
padding.
PEM (used for SSL certificates and SSH keys) is standard Base64 wrapped at 64 characters per line with
-----BEGIN CERTIFICATE----- headers. MIME email wraps at 76 characters. The line breaks are
not part of the Base64 data; they exist so 1980s-era mail servers would not choke on long lines. Most
modern decoders ignore whitespace, but if you are writing one yourself, remember to strip newlines
before decoding or you will get a length error.
When you should not reach for Base64
Because Base64 trades 33% more bytes for transport safety, it is the wrong tool whenever the channel is
already binary-safe. Sending Base64-encoded image data through a multipart/form-data upload
is strictly worse than sending the raw bytes — you pay the size tax for nothing, because the
upload protocol handles binary fine. Storing Base64 in a database BLOB column wastes a third of the
space and forces every reader to decode. The rule: use Base64 at the boundary with a text-only system,
and store or transmit raw bytes everywhere else.
A related anti-pattern is Base64-encoding then gzipping, or gzipping then Base64-encoding, in the hope of saving space. Base64-then-gzip recovers much of the overhead (Base64 text compresses to roughly the original size) but the result is no longer text-safe, which defeats the only reason to encode in the first place. gzip-then-Base64 doubles the work and still pays the size tax. If you need compression, compress; if you need text-safety, encode; doing both in sequence is almost always a mistake.
A short summary
- Base64 maps any bytes to 64 printable ASCII characters so binary can travel through text channels.
- Three input bytes become four output characters, so the result is about 33% larger.
- Strings must be UTF-8-encoded to bytes first; decode yields bytes that must be UTF-8-decoded back to text.
- Use it for data URIs, Basic auth, JWTs, MIME, and embedding binary in text formats.
- Base64 is encoding, not encryption. It hides nothing from anyone.
Once you internalize that Base64 is purely a transport trick — bytes in, slightly more bytes out, fully reversible — the rest follows. You reach for it when a channel cannot safely carry raw bytes, and you reach for something else (TLS, real encryption, a hash) when you need actual protection.
Related tools
- Base64 Encoder / Decoder — convert text to and from Base64 with full UTF-8 support.
- Image to Base64 — turn an image file into a ready-to-paste data URI.