I wrote my own SplitText implementation for a client site because I needed per-word GSAP text animation without pulling in a paid plugin. It worked smoothly until QA sent me a single screenshot: a heading whose source read & tržišnu percepciju was rendering on screen as & tržišnu percepciju. The & character had turned into a raw HTML entity, visible to visitors. And there was a second bug riding along: a heading that used <br> to force a second line collapsed onto a single self-wrapping line instead.
My first instinct pointed the wrong way. I assumed this was an encoding problem from the CMS, so I inspected the raw data. Clean. In the database and in the server response, the character was a plain &, a single normal ampersand. It only became & after my SplitText render ran. So the data was not corrupt. My own code was manufacturing that entity out of an ampersand that had been perfectly fine.
Why this happens
I reopened my split routine and the culprit jumped out. To break the heading into words, I read the element's markup through the innerHTML getter, then ran a regex over that string to strip tags and pull the text out. It looks reasonable. The problem is that the innerHTML getter does not just copy back what you typed — it re-encodes the text into valid HTML. So an & sitting in the DOM comes back to me as the string &. My regex only saw characters; it had no idea that & was a single entity. So it copied those five characters verbatim into the output, and the browser rendered them exactly like that.
The <br> was a victim of the same approach. Because I was reading innerHTML as a string, <br> was just a text token that my regex stripped, not a line break. So the intentional line structure of the heading disappeared.
And there was a subtler third trap. A separate bug cleared the element before I got to walk it, so by the time the "words" path ran, the element was already empty and every segment vanished. Three problems stacked into one, but they shared a root: I was treating the DOM as an HTML string when I should have been walking it as a tree of nodes.
The fix
The fix was to stop reading innerHTML altogether and walk the live childNodes straight off the DOM, BEFORE clearing the element. A text node gives me node.textContent, which is already browser-decoded, so & stays &. For an element, I check tagName: if it is BR, I close the current segment and start a new line; otherwise, I recurse into it.
class CustomSplitText {
split() {
const segments = [];
let buffer = "";
const walk = (node) => {
for (const child of node.childNodes) {
if (child.nodeType === Node.TEXT_NODE) {
buffer += child.textContent;
} else if (child.nodeType === Node.ELEMENT_NODE) {
if (child.tagName === "BR") {
segments.push(buffer);
buffer = "";
} else {
walk(child);
}
}
}
};
walk(this.element);
segments.push(buffer);
this.renderWords(segments);
}
}Notice the order: I walk this.element.childNodes FIRST, gather every segment, and only then re-render the contents. Because textContent is browser-decoded, no & ever surfaces again. Because I handle BR as a real segment boundary rather than a string token, the multi-line structure comes back intact.
For the third bug, the key is clear ordering. The element clear MUST stay inside the character-processing branch, not at the top of split(). If I empty the element early, the word-walk runs over an already-emptied element and every segment is lost. By keeping the clear in the chars branch, the word walk always runs over an intact DOM.
splitChars() {
const chars = this.collectChars();
this.element.innerHTML = ""; // clear ONLY here, after reading
this.renderChars(chars);
}The takeaway
The innerHTML getter is a serializer, not a mirror. It returns re-encoded HTML, so the moment you run a string operation on it, entities like &, <, > leak into your output as literal text. When what you actually need is the real text a user sees, read textContent off the node, or better yet, walk childNodes as a tree: text nodes give you decoded text, and elements like <br> can be handled semantically instead of as string tokens. And if you need to empty an element as part of a transform, read it fully first and clear it afterward — never the other way around. Since I stopped parsing HTML with regex and started walking the DOM, the ampersand went back to being an ampersand.
