shakedown.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A community for live music fans with roots in the jam scene. Shakedown Social is run by a team of volunteers (led by @clifff and @sethadam1) and funded by donations.

Administered by:

Server stats:

271
active users

#html5

1 post1 participant0 posts today

Posting in case this helps someone else.

For the last month or two, I've had problems visiting some sites with #Firefox. The sites use #Cloudflare, and it would give the "prove you're a human by checking this box" prompt. I'd never had problems with this before.

Suddenly that started failing consistently. Clicking the box would do something, but then just reload the challenge. There was no way to pass it. I tried reconfiguring and even disabling the "uBlock Origin" extension, cookie policies, other things - no luck.

Based on a comment on this discussion:
ask.metafilter.com/385480/fixi

... I tried disabling the "Disable HTML5 Autoplay" extension, and ... fixed! That was *not* an obvious candidate for causing the problem, unlike adblockers etc.

So it appears Cloudflare changed their in-browser testing JS such that it requires HTML5 autoplay to pass the check. Which is ... special. I can think of lots of people with very good reasons to have that disabled.

Anyways, I hope this helps someone.

ask.metafilter.comfixing Verify You Are A Human - Firefox + cookies + UBlock OriginCloudflare's 'verify that you are a human' test always fails for me in Firefox, but works in Safari - I can load sites in Safari that won't load in Firefox. I suspect the Firefox problem is my...

Stop using preg_* on HTML and start using \Dom\HTMLDocument instead

shkspr.mobi/blog/2025/05/stop-

It is a truth universally acknowledged that a programmer in possession of some HTML will eventually try to parse it with a regular expression.

This makes many people very angry and is widely regarded as a bad move.

In the bad old days, it was somewhat understandable for a PHP coder to run a quick-and-dirty preg_replace() on a scrap of code. They probably could control the input and there wasn't a great way to manipulate an HTML5 DOM.

Rejoice sinners! PHP 8.4 is here to save your wicked souls. There's a new HTML5 Parser which makes everything better and stops you having to write brittle regexen.

Here are a few tips - mostly notes to myself - but I hope you'll find useful.

Sanitise HTML

This is the most basic example. This loads HTML into a DOM, tries to fix all the mistakes it finds, and then spits out the result.

 PHP$html = '<p id="yes" id="no"><em>Hi</div><h2>Test</h3><img />';$dom = \Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR | LIBXML_HTML_NOIMPLIED , "UTF-8" );echo $dom->saveHTML();

It uses LIBXML_HTML_NOIMPLIED because we don't want a full HTML document with a doctype, head, body, etc.

If you want Pretty Printing, you can use my library.

Get the plain text

OK, so you've got the DOM, how do you get the text of the body without any of the surrounding HTML

 PHP$html = '<p><em>Hello</em> World!</p>';$dom = \Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR , "UTF-8" );echo $dom->body->textContent;

Note, this doesn't replace images with their alt text.

Get a single element

You can use the same querySelector() function as you do in JavaScript!

 PHP$element = $dom->querySelector( "h2" );

That returns a pointer to the element. Which means you can run:

 PHP$element->setAttribute( "id", "interesting" );echo $dom->querySelector( "h2" )->attributes["id"]->value;

And you will see that the DOM has been manipulated!

Search for multiple elements

Suppose you have a bunch of headings and you want to get all of them. You can use the same querySelectorAll() function as you do in JavaScript!

To get all headings, in the order they appear:

 PHP$headings = $dom->querySelectorAll( "h1, h2, h3, h4, h5, h6" );foreach ( $headings as $heading ) {   // Do something}

Advanced Search

Suppose you have a bunch of links and you want to find only those which point to "example.com/test/". Again, you can use the same attribute selectors as you would elsewhere

 PHP$dom->querySelectorAll( "a[href^=https\:\/\/example\.com\/test\/]" );

Replacing content

Sadly, it isn't quite as simple as setting the innerHTML. Each search returns a node. That node may have children. Those children will also be node which, themselves, may have children, and so on.

Let's take a simple example:

 PHP$html = '<h2>Hello</h2>';$dom = \Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR | LIBXML_HTML_NOIMPLIED, "UTF-8" );$element = $dom->querySelector( "h2" );$element->childNodes[0]->textContent = "Goodbye";echo $dom->saveHTML();

That changes "Hello" to "Goodbye".

But what if the element has child nodes?

 PHP$html = '<h2>Hello <em>friend</em></h2>';$dom = \Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR | LIBXML_HTML_NOIMPLIED, "UTF-8" );$element = $dom->querySelector( "h2" );$element->childNodes[0]->textContent = "Goodbye";echo $dom->saveHTML();

That outputs <h2>Goodbye<em>friend</em></h2> - so think carefully about the structure of the DOM and what you want to replace.

Adding a new node

This one is tricky! Let's suppose you have this:

 HTML<div id="page">   <main>      <h2>Hello</h2>

You want to add an <h1> before the <h2>. Here's how to do this.

First, you need to construct the DOM:

 PHP$html = '<div id="page"><main><h2>Hello</h2>';$dom = \Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR | LIBXML_HTML_NOIMPLIED, "UTF-8" );

Next, you need to construct an entirely new DOM for your new node.

 PHP$newHTML = "<h1>Title</h1>";$newDom = \Dom\HTMLDocument::createFromString( $newHTML, LIBXML_NOERROR | LIBXML_HTML_NOIMPLIED, "UTF-8" );

Next, extract the new element from the new DOM, and import it into the original DOM:

 PHP$element = $dom->importNode( $newDom->firstChild, true ); 

The element now needs to be inserted somewhere in the original DOM. In this case, get the h2, tell its parent node to insert the new node before the h2:

 PHP$h2 = $dom->querySelector( "h2" );$h2->parentNode->insertBefore( $element, $h2 );echo $dom->saveHTML();

Out pops:

 HTML<div id="page">   <main>      <h1>Title</h1>      <h2>Hello</h2>   </main></div>

An alternative is to use the appendChild() method. Note that it appends it to the end of the children. For example:

 PHP$div = $dom->querySelector( "#page" );$div->appendChild( $element );echo $dom->saveHTML();

Produces:

 HTML<div id="page">   <main>      <h2>Hello</h2>   </main>   <h1>Title</h1></div>

And more?

I've only scratched the surface of what the new 8.4 HTML Parser can do. I've already rewritten lots of my yucky old preg_ code to something which (hopefully) is less likely to break in catastrophic ways.

If you have any other tips, please leave a comment.

The PHP logo.
Terence Eden’s Blog · Stop using preg_* on HTML and start using \Dom\HTMLDocument instead
More from Terence Eden
Replied in thread

@ajsadauskas @JessTheUnstill @tomiahonen yes, and to add insult to injury #Mozilla didn't even wanted to sell people like @fuchsiii or me a #FirefoxOS device, with the only one being "launched" in the #EU being a #SimLock'd & #NetLock'd #prepaid phone in #Spain one could only attain in-store with all the "#KYC" nonsense they had, demanding a legal address in Spain back then.

And #nerds like myself are far from the "#consoomer #Normies" for whom stuff that isn't on shelves at Staturn/MediaMarkt, BestBuy, Walmart, ... doesn't exist. I'm used to importing #tech that I want!

in case you missed it: back in 2016, before the US general election, we published this browser game, "election fighter", to test our visualization engine :) at the time, we never could have imagined that we would have to see what we're seeing today happening. truly revolting! perhaps this game can still serve its original purpose: to help you de-stress :)

vizflow.github.io/vizflow/exam

vizflow.github.ioElection Fighter - Now you can Fight BackTired of elections dominating the headlines? Now you can fight back