support@wpsupportservices.co.uk
0330 22 33 458

Mark Jaquith: How I fixed Yoast SEO sitemaps on a large WordPress site

Mark Jaquith: How I fixed Yoast SEO sitemaps on a large WordPress site

One of my Covered Web Services clients recently came to me with a problem: Yoast SEO sitemaps were broken on their largest, highest-traffic WordPress site. Yoast SEO breaks your sitemap up into chunks. On this site, the individual chunks were loading, but the sitemap index (its “table of contents”) would not load, and was giving a timeout error. This prevented search engines from finding the individual sitemap chunks.

Sitemaps are really helpful for providing information to search engines about the content on your site, so fixing this issue was a high priority to the client! They were frustrated, and confused, because this was working just fine on their other sites.

Given that this site has over a decade of content, I figured that Yoast SEO’s dynamic generation of the sitemap was simply taking too long, and the server was giving up.

So I increased the site’s various timeout settings to 120 seconds.

No good.

I increased the timeout settings to 300 seconds. Five whole minutes!

Still no good.

This illustrates one of the problems that WordPress sites can face when they accumulate a lot of content: dynamic processes start to take longer. A process that takes a reasonable 5 seconds with 5,000 posts might take 100 seconds with 500,000 posts. I could have eventually made the Yoast SEO sitemap index work if I increased the timeout high enough, but that wouldn’t have been a good solution.

  1. It would have meant increasing the timeout settings irresponsibly high, leaving the server potentially open to abuse.
  2. Even though it is search engines, not people, who are requesting the sitemap, it is unreasonable to expect them to wait over 5 minutes for it to load. They’re likely to give up. They might even penalize the site in their rankings for being slow.

I needed the sitemap to be reliably generated without making the search engines wait.

When something intensive needs to happen reliably on a site, look to the command line.

The Solution

Yoast SEO doesn’t have WP-CLI (WordPress command line interface) commands, but that doesn’t matter — you can just use wp eval to run arbitrary WordPress PHP code.

After a little digging through the Yoast SEO code, I determined that this WP-CLI command would output the index sitemap:

wp eval '
$sm = new WPSEO_Sitemaps;
$sm->build_root_map();
$sm->output();
'

That took a good while to run on the command line, but that doesn’t matter, because I just set a cron job to run it once a day and save its output to a static file.

0 3 * * * cd /srv/www/example.com && /usr/local/bin/wp eval '$sm = new WPSEO_Sitemaps;$sm->build_root_map();$sm->output();' > /srv/www/example.com/wp-content/uploads/sitemap_index.xml

The final step that was needed was to modify a rewrite in the site’s Nginx config that would make the /sitemap_index.xml path point to the cron-created static file, instead of resolving to Yoast SEO’s dynamic generation URL.

location ~ ([^/]*)sitemap(.*).x(m|s)l$ {
    rewrite ^/sitemap.xml$ /sitemap_index.xml permanent;
    rewrite ^/([a-z]+)?-?sitemap.xsl$ /index.php?xsl=$1 last;
    rewrite ^/sitemap_index.xml$ /wp-content/uploads/sitemap_index.xml last;
    rewrite ^/([^/]+?)-sitemap([0-9]+)?.xml$ /index.php?sitemap=$1&sitemap_n=$2 last;
}

Now the sitemap index loads instantly (because it’s a static file), and is kept up-to-date with a reliable background process. The client is happy that they didn’t have to switch SEO plugins or install a separate sitemap plugin. Everything just works, thanks to a little bit of command line magic.

What other WordPress processes would benefit from this kind of approach?


Do you need WordPress services?

Mark runs Covered Web Services which specializes in custom WordPress solutions with focuses on security, speed optimization, plugin development and customization, and complex migrations.

Please reach out to start a conversation!

[contact-form]

Source: WordPress