Google vs. Beethoven

google symphony 5

Or, How I Learned to Hate Link Rel=’canonical’

{This may look like inside baseball, whinging, and navel-gazing, but we hope it will help others who have a site that was generated in WordPress and includes pages that are based on WordPress templates but are taken outside a WordPress environment, such as dynamic php search pages, and find themselves not being indexed by search engines.}

Those scant few of you who have made it to The Unheard Beethoven may have tried our Google search box and been dismayed to find that it doesn’t work. We were also dismayed by that, and as of this writing it’s still not working. [I am pleased to note that as of September 2013 it IS working, finally.] But as we futzed with the search box, we learned to our horror that the problem was not the search box. It worked just fine. The problem was that Google was not indexing this site at all. [I should say that I will frequently talk about Google, but NO search engines were indexing us. No Bing, no Yahoo (now one and the same) and none of the others---we even checked several meta search engines and came up empty.]

To be more precise, while this blog and the main pages of the site, which were made with WordPress, using the Esplanade theme, are indexed by Google, the principal content of the site, consisting of +/- 400 search result pages that point to particular unheard Beethoven works, was completely invisible to the outside world. That, obviously, was unacceptable. For some weeks now, we have been trying various things to get search engines to find us, all without success.

Our first calls for help were answered by responses that there were some pretty basic problems with our site structure. The test site ( that we had used before making the new Unheard Beethoven live was still floating around, which meant that when you clicked on a link, you might go to the domain, or you might go to the test domain. That was obviously unstable, and our webmaster, who has been exceedingly patient through all our flailing around, was able to get that fixed readily. There were also inconsistencies with how the site’s domain was written; apparently the search engines consider “” to be a different site than “” so we cleaned that up and adopted “” as the canonical form of the domain in the Google Webmaster Tools (which is an essential spot for working on this kind of thing). In the process I broke the site completely, and it took about a week to get the thing rebuilt. Seriously, there should be a big warning on the Settings>General page of the WordPress dashboard that tells you not to change the “WordPress Address (URL)” of the website, because it breaks everything. Everything. Large pieces of duct tape covering that option completely would not be a bad idea.

It was also suggested to us that we get a sitemap in place to generate a road map for the search engines to travel down in order to increase the likelihood of us getting indexed. We tried, and our initial efforts only pointed to five pages. That obviously wasn’t right; it seemed to be getting stuck on the slider that is at the top of the home page. Once we got rid of the problems with the test domain,, and set everything to, that helped greatly. The sitemap generator was now able to find all the pages and so we thought that would put us in business. We added the sitemap.xml to the root directly, and put a line in the robots.txt file to show where the sitemap lives, as instructed by the Webmaster Tools help page on sitemaps and figured we were in good shape.

Alas, there was no freude. While this resulted in 54-64 pages (depending on what you look at) of the site getting indexed, still none of the search result pages were being indexed. These search result pages are generated by php from a database, but Google and various experts insist that dynamic search result pages such as should be followed and indexed by Googlebot just fine. I tried another tack, which was going into the URL Parameters section of the Google Webmaster Tools, and specifying that our various parameters (identifier, opus, hess, etc.) all specified the page, and that Google should index every page. That seemed good too, but it didn’t work either.

I then made another pathetic plea for assistance on the Google webmaster support forums, and this time we got what might be the key. Roman Maeschi, a regular on those forums, took a look at our search result pages and made a key observation:

“this is very bad and is probably the reason for not being indexed:

link rel=’canonical’ href=’/’ /

“Every page that has such a canonical link basically gets prevented from being indexed as this tells Google not to index the URL of the page but instead to go and index a page under this – non-existent – URL: link rel=’canonical’ href=’/’ /

“You will need to either remove the canonical link or fill it with the correct URL that you want index.”

After looking at the source code for various search result pages, sure enough, there it was, on every one of these pages. What was that, and where the heck did it come from?? A bit of Google searching indicated that it’s part of the standard WordPress setup, and allows canonical links to be generated whenever you post (such as, most likely, what will happen when this post goes live). But our search template page, which was originally generated in WordPress, isn’t actually within the WordPress environment. Our expectation is that while it works fine within WordPress, it generates extremely problematic garbage like this when it’s taken outside.

As a side note/rant, the “link rel=’canonical’” is supposed to be used for duplicate pages; it essentially tells search engine bots that the page you’re looking at is the same as some other page, and therefore the bot should index under that other page, not the one you’re looking at. It’s beyond me why WordPress includes such a dangerous item as part of its standard setup. How many blog pages are likely to be duplicated? It’s regrettable that this highly problematic code that is really a bug is treated as if it were a feature, because it has caused us and many others in our situation a lot of grief in trying to figure out why search engines are ignoring our work. A Google search (ha-ha!) discloses that this is an extremely common problem for folks with WordPress-based sites. Rather than being a standard, this ‘canonical’ is really something that should only be implemented by professional search engine optimization types who know what the hell they are doing with such code (which emphatically would not include a Wisconsin tax attorney or a Dutch composer).

Our webmaster instantly went on a search and destroy mission and eliminated this ‘canonical’ language from the search templates. Just to make sure, we generated a new sitemap and put it up, and notified Google of the new sitemap. It’s now a week later, and Google and all the others STILL are not indexing any of our search result pages. That’s distressing, to say the least, but it appears as if these bad canonical links take a while for search engines to recover–they seem to remember the ‘canonical’ instruction long after it has been obliterated. One blog post we found indicated three weeks to get indexed, while another claimed six months. We’ll rest better on our end once we see the first search result page get indexed, but we’re hoping that we’ve found the solution to the problem. We’ll update this post as things change. Let us know, please, in the comments if you start seeing search result pages showing up in Google for particular works, or if you have any more suggestions as to how we can get search engines to find us. Our endless thanks to our webmaster for putting up with me through all this, and especially to Roman Maeschi for being so kind as to take a look at our problematic code and offer this important insight!! Our fingers remain crossed that we have our solution.

Really: We’re here, we’re alive, we’re making music (and in fact we have a Very Major Discovery to announce soon). If only we were not invisible to the world….


UPDATE: I’m far too impatient, but after ten days of nothing getting indexed, we also deleted the lines for link rel=next and link rel=prev, which were all pointing to the same page and thus might be sending crawlers into a loop. It wasn’t right, regardless, so it can’t hurt. Another trap laid by WordPress for the unwary.


UPDATE 2 (8 July 2013): After two weeks, Google has found one search result (that for Biamonti 274) and Bing/Yahoo has found two (the Scherzo to the Fifth Symphony and Hess 79). So I think we’re finally on the right track; there’s no obvious reason why those pages should be indexable and nothing else is. Roman Maeschi suggested that having a 301 redirect from the www URLs of the site to the non-WWW URLs of the site would be useful, since this is another canonical issue; Google etc. may see these as duplicate pages and then as a result index neither.

In any event, for the curious, this was the code that was inserted into the .htaccess file to get the redirects from the URL to the URL; as far as I can tell, it’s working beautifully:

Options +FollowSymlinks
RewriteEngine on
RewriteBase /

### re-direct IP address to non-www form of main domain

### re-direct www to non-www of main domain

### re-direct any parked domain to non-www form of main domain
RewriteCond %{http_host} !^$ [nc]
RewriteRule ^(.*)$$1 [r=301,nc,L]

Thanks to this article, which gave us something clear, slick and handy to cut and paste:
In any event, I think since we are getting a small handful of search result pages, it’s now just a matter of (a very long) time before we get them all. Fingers crossed.

UPDATE 3 (16 July 2013): After three weeks, Google is up to nine search result pages indexed; Bing/Yahoo is up to 8. Roman likens it to trying to turn an oil tanker at sea, and it’s an apt analogy. This is excruciatingly slow but it’s coming, bit by bit. Just, erm, 391 to go….

UPDATE 4 (15 August 2013): The glacial pace continues. Google is up to 13 search result pages indexed, though now all of Google’s results for the old website are gone (which is good, because they were mostly dead links). Bing, alas, is now down to 6. It may drop even farther, since two of those six are links to the old site as well. The slowness of this procedure has made me stop checking daily, anyway. It will get there when it gets there.

UPDATE 5 (23 September 2013): We have had a major breakthrough! Possibly it was connected to a number of websites linking to The Unheard Beethoven as a result of the discovery of Hess 137, or it may be pure coincidence. But as of today there are 181 search result pages for Google, whereas last week there were only about 25. We hope the rest will follow in short order. Bing has only 45, but again that’s good progress.

UPDATE 6 (27 September 2013): The glacier has clearly broken free. On September 23 there were 181 search result pages in Google; on September 25 there were 202; on September 26 there were 220. Now that over half of the search result pages are indexed, the Google search box at the top of the page is actually usable (not 100% complete, but it returns results). I think we can finally relax a bit, and it has taken about three months after eliminating the link rel=’canonical’ garbage to reach this stage. Thanks again to Roman Maeschi for the key insight that rescued this poor forsaken website from oblivion.

UPDATE 7 (30 September 2013): 281 results as of today on Google, so almost to the 75% mark. Bing is lagging with 51, but still moving in the right direction.

UPDATE 8 (15 October 2013): Google and Beethoven have reached a truce. Google is showing 480 search results, which should be just about everything on the site; if it’s not, I expect they’ll be added any day now. I am calling this case closed after approximately four months. Bing is lagging badly behind (74 results) but since we use Google for the site search engine, that’s the one that really counts so far as I’m concerned. Thanks once more to Kevin and Roman for helping us get this sorted out.