dotCMS includes an app that can be used to connect to and control Prerender.io, a web service that can be used to improve SEO rankings for client-side rendered Javascript/SPA pages. The dotCMS Prerender app does this by intercepting requests by indexing bots, such as Googlebot or Bingbot, and then proxying the request to the Prerender.io service. This service does the work of “prerendering” the Javascript app, including any content that is returned by client-side API calls. It then returns the resultant rendered — or “hydrated” — HTML to dotCMS and finally to the request for indexing.
To use this app, you will need either an account with Prerender.io or to have set up your own instance of the Prerender.io open-source application, which can be connected to and used by your dotCMS instance.
How the App Works
- It first performs a check to see if we should show a prerendered page.
- Check if the request is from a crawler — as defined in the
crawlerUserAgents
or if the request has an_escaped_fragment_
in its URL. - Check to make sure we aren't requesting a resource — such as JS, CSS, etc.
- (optional) Check to make sure the URL is in the whitelist.
- (optional) Check to make sure the URL isn't in the blacklist.
- Check if the request is from a crawler — as defined in the
- Make a
GET
request to the Prerender service (PhantomJS server) for the page's prerendered HTML. - Return that HTML to the crawler.
Customization
Prerender Service URL
Defaults to the Prerender.io service at http://service.prerender.io/
. If you've deployed the opensource Prerender.io service on your own infrastructure, you can set the URL so that it points there instead.
prerenderToken
This is the token from your Prerender.io account used to validate the prerender request.
protocol
If you specifically want to make sure that the Prerender service queries using HTTPS or HTTP protocol, you can set the init-param protocol
to https
or http
respectively. Should generally be https.
whitelist
This is a comma-separated list of URL regular expressions (regexes) that will be sent to Prerender.io for rendering.
Example:/products/.*,/blog/.*
blacklist
This is a comma-separated list of URL regexes that will never be sent to Prerender.io for rendering.
Example: /images/.*,/css/.*
crawlerUserAgents
This is a comma-separated list of strings that will be matched against the request's User-Agent
header. If one of these strings match, the request will be proxied to Prerender.io.
Example: googlebot,bingbot
forwardedURLHeader
Important for servers behind a reverse proxy that need a different public URL to be used for prerendering.
Testing
You can test if your requests are being prerendered by setting the correct User-Agent
header in a page request. For example:
curl --head -H 'User-Agent: googlebot' https://www.my-spa.com/blogs/improving-our-seo
This will give you back a Prerender request header — something like:
x-prerender-requestid: 28fcca74-71a4-4b5f-a293-380e81ec4cac
Note on Hashbang Navigation
If you are using a #
in your URLs, make sure to change it to #!
, known as the hashbang character.
For hashbang URLs
To see: | http://localhost:3000/#!/profiles/1234 |
Go to: | http://localhost:3000/?_escaped_fragment_=/profiles/1234 |
For push-state URLs
To see: | http://localhost:3000/profiles/1234 |
Go to: | http://localhost:3000/profiles/1234?_escaped_fragment_= |
For general information about AJAX crawling, read more about Google's protocol here.