A company site built with Elementor needed a bulk content transformation: the markup inside text widgets had to be cleaned up across dozens of pages at once, including demoting some rogue headings for consistency. I knew Elementor stores every widget's settings in a single post meta called _elementor_data: one giant JSON string holding all sections, columns, and widgets, including the rich-text HTML in each text widget's editor field. Editing dozens of pages by hand in the Elementor editor was obviously not an option. So I took the fastest route a developer in a hurry can think of: regex straight against the raw string.
// The shortcut that caused all of this. Do not do this.
$raw = get_post_meta( $post_id, '_elementor_data', true );
$raw = preg_replace( '/<h3([^>]*)>(.*?)<\/h3>/s', '<h2$1>$2</h2>', $raw );
update_post_meta( $post_id, '_elementor_data', $raw );On most pages the result looked fine. Then I opened a few others and went cold: the text widgets were rendering Lorem ipsum dolor sit amet. Not one widget, several at once. The client's content was gone, replaced by stock Latin filler. My first reflex was to blame the replacement string: maybe my regex had "swapped in" something wrong. But that made no sense, there was no lorem anywhere in my pattern or my replacement.
I checked the database. On the broken pages, the _elementor_data meta was still there and still long. But the moment I ran it through json_decode, I got null back. The JSON was no longer valid. That was when the picture snapped into focus: my regex had not replaced the content with lorem ipsum. My regex had destroyed the structure, and the lorem ipsum was coming from somewhere else.
Why this happens
_elementor_data is not HTML. It is JSON, and the content HTML lives inside JSON strings. The stored form looks roughly like this:
{"id":"3a91f","widgetType":"text-editor","settings":{"editor":"<h3 class=\"intro\">About the studio<\/h3><p>We opened in 2012.<\/p>"}}Note the details: quotes are escaped as \", slashes as \/, and the occasional unicode escape like \u00a0 sneaks in from the editor. My pattern was written with rendered HTML in mind. In the stored form, the closing tag is not </h3> but <\/h3>, so the (.*?) part, which I had helpfully given the /s flag, kept devouring characters until it found a literal match somewhere else, far past the boundary of its own string. Along the way it swallowed escaped quotes, commas, and the braces that belong to the JSON structure. A pattern that looked correct ended up eating a delimiter. Some pages got lucky; on others, one rogue match was enough to leave the editor value invalid and the whole document unparseable.
So where did the lorem ipsum come from? That is Elementor's own behavior. When Elementor loads a text widget whose editor setting is empty or invalid, it falls back to that widget's default content, and the text widget's default content is a Lorem ipsum paragraph. The client's content was not "replaced by lorem ipsum". It was destroyed, and Elementor papered over the hole with its built-in placeholder. If you ever see lorem ipsum surface on a live site, that is not a display bug: your stored value has gone bad.
The fix
I restored the corrupted pages from a database backup. Then I rewrote the transformation under the rule I should have followed from the start: never regex _elementor_data. Decode it first, walk the structure, transform the clean value, then encode it back.
function transform_editor_html( $html ) {
// Work on the decoded value: this is real HTML now, not JSON-escaped soup.
$html = str_replace( '<h3', '<h2', $html );
$html = str_replace( '</h3>', '</h2>', $html );
return $html;
}
function walk_elements( array $elements, callable $transform ) {
foreach ( $elements as &$element ) {
if ( ( $element['widgetType'] ?? '' ) === 'text-editor' && ! empty( $element['settings']['editor'] ) ) {
$element['settings']['editor'] = $transform( $element['settings']['editor'] );
}
if ( ! empty( $element['elements'] ) ) {
$element['elements'] = walk_elements( $element['elements'], $transform );
}
}
return $elements;
}The target widgets are identified by widgetType === 'text-editor', and because sections and columns nest, the walker has to be recursive. I rebuild the full editor string in PHP, on the decoded value, instead of patching the middle of raw JSON. Then save it back:
$raw = get_post_meta( $post_id, '_elementor_data', true );
$data = json_decode( $raw, true );
if ( ! is_array( $data ) ) {
return; // Never write back something you could not parse.
}
// Keep an escape hatch: copy the untouched value to a backup key first.
update_post_meta( $post_id, '_elementor_data_backup', wp_slash( $raw ) );
$data = walk_elements( $data, 'transform_editor_html' );
// Elementor stores this meta slashed, so wp_slash() is mandatory here.
update_post_meta( $post_id, '_elementor_data', wp_slash( wp_json_encode( $data ) ) );Two traps in the save step. First, wp_slash() is non-negotiable: Elementor stores this meta slashed, and update_post_meta unslashes once on write. Without wp_slash(), the escapes inside the JSON get eaten, and you corrupt the data from the opposite direction. Second, once the data changes, Elementor's generated CSS can go stale, so regenerate it:
if ( class_exists( '\Elementor\Plugin' ) ) {
\Elementor\Plugin::$instance->files_manager->clear_cache();
}I dry-ran the new version on a single post ID, verified the result in the Elementor editor and on the frontend, and only then let it loose site-wide. This time, not a single lorem ipsum.
The checklist before touching _elementor_data
- Structured data gets structured tools:
json_decode, walk,wp_json_encode. Never regex JSON-in-a-string. - Lorem ipsum appearing in a text widget means the stored value went invalid, not a display bug.
- Copy
_elementor_datato a backup meta key before any bulk transform. - Test on one post ID, verify in the editor and on the frontend, then scale up.
wp_slash()on save, then clear Elementor's CSS cache.
The takeaway
Regex feels like a shortcut precisely because it ignores structure, and that is exactly why it is dangerous here. JSON inside a string, with escaped quotes, escaped slashes, and nested HTML, will always have that one case where even the tidiest pattern eats a delimiter. Since this incident, every time I am tempted to regex data that has structure, I stop and ask first: how much time am I saving right now, compared to the time I will spend restoring a client's content from backup?
