Blogger, Robots.txt, Canonical URLs, Feeds - Let's get some Synergy

A little rant where I ask Blogger to make a slight change. The story begins...

Several months back, Blogger changed the way they did comments somewhat. Short version is they broke up post pages that receive many comments (200+). This is fine. In doing so, they also had to add some query parameters to comment permalinks so they could work with the new pagination. Again, nothing wrong with that.

But sometimes googlebot gets confused with a page having multiple urls. These are canonical issues. ( Admission - I have trouble both spelling and saying canonical, but I digress). This hit our favorite blog phydeaux3 when I noticed suddenly last December hits from Google had dropped off, to almost nothing. What I found when I started looking into it was it seemed Google was now suddenly grabbing tons of these comment permalink urls and giving them prominence over what should be the real url. Example.

Instead of having this url in the Google index
phydeaux3.blogspot.com/2006/09/code-for-beta-blogger-label-cloud.html
Which is the proper url, which has hundreds of links pointing at it, Google was indexing urls like this instead
phydeaux3.blogspot.com/2006/09/code-for-beta-blogger-label-
cloud.html?showComment=1221076440000

Which is just a link to a particular comment. But since that url doesn't having any links to it (as it shouldn't) it doesn't rise to the top of any searches like the real url would.

In a perfect world, Google usually knows which is the better url, and probably most of the time it does things correctly. But for whatever reason Google was mucking it up the same way on many of my posts that used to get search hits.

I should say here that ultimately I don't give a shit. Hell it took me a month to even notice as I'm not doing a lot of posting. I don't really care too much whether I get hits or not and I figure it will eventually work itself out. But it's a problem with such an easy solution.

Now my good pal, Notorious I.M.P, pointed out to me this recent post from Google Webmaster Central talking about a new feature that allows you to fix (supposedly) canonical issues like this by adding a "hint" to google with a <link> tag. Well that's fine. If'n it works.

If you wanted to try it for Blogger you would add something like this to the head of your template.

<b:if cond='data:blog.pageType == "item"'>
<link expr:href='data:blog.url' rel='canonical'/>
</b:if>


I've just tried that so I don't know if it actually works or not, but it outputs things the way the Webmasters Tools blog says to.

But for Blogger, we are really fighting a battle that could have a better solution. Only we users can't do it. It would have to be done at Blogger. A better solutions would be to add a few lines to the robots.txt file to take into account these canonical urls that the comment pagination changed caused. Blogger already blocks urls with "search" in the path which correctly blocks redundant label/search pages. But a few tweaks would help out also. Something along the lines of adding these two lines to the robots.txt

Disallow: /feeds/comments
Disallow: /*?showComment*


Now if I have that right (which I may not) that would 1) Block comments feeds from being indexed. That's where I believe the comment pagination links are being picked up from mainly, and besides comments feeds really don't need to be indexed do they? 2). Would block any urls with ?showComment in them which would block any of the comment permalinks if they got picked up somewhere else.

Or maybe there is a better way of doing it. Or they could do nothing. I'm just ranting.

Random Post Widget Fix

Ok, following up on the recent breakage of the Random Post widget, I've updated the instructions page with the new code that should fix everything. If you need it, then go to the Widgets Instruction page and you can get the new works.

If you installed the one-click widget then all you need to do is delete the old one, and reinstall from the updated instructions page.

If you used the manual code, then you just need to replace the old script code in your template with the updated code. (Note - if you are an FTP blog make sure you redo the first line with your Blog ID - only FTP blogs need to do that.)

Hopefully this should resolve all issues. Thanks for shopping at phydeaux3 and have a nice day.

Archive Calendar Fix

Ok, following up on my previous post about problems with the Archive Calendar, I've made the changes in the calendar script so it works correctly again. I've updated the installation page with those changes so any future installs should work correctly. Now for anyone that's already been using it - here's the easiest way to get it working again. You'll probably need to just recopy/paste the script portion only so you can get the correct version. There are only a few lines changed, but trying to direct you to only those lines would be a real pain. You do NOT have to redo everything - just the script portion. So. Here's the plan.

Fire up Blogger and goto your Layouts tab, then to the Edit HTML link.

Make a BACKUP of your current template in case a mistake is made so you can revert back to it.

Scroll down a bit and you'll find the script section that begins !-- Blogger Archive Calendar -- and ends with !-- End Blogger Archive Calendar --

Go directly to Step #2 of the Archive Installation and copy that section and use it to replace the previous code.

Save the changes and that should do it (you may have to clear your browsers cache afterward when you check your blog to make sure it's getting the updated code). It's just one simple copy/replace - if you made it through the initial install then this should be easy enough.

Calendar / Random Post Busted

Lesson learned (as if it shouldn't have been learned already). NEVER assume that ANYTHING will remain constant.

Unfortunately due to my lack of foresight, and Blogger making small changes in their feeds, the Blogger Archive Calendar and the Random Post Widget are slightly broken. For the calendar, the links to each post underneath the calendar are broken and instead will either go to a comment feed or the comment form. Same problem with the Random Post widget.

In a nutshell, the Blogger feeds that these work off of have changed the order in which they list the url that's needed. It's still there, just not where the script expects it. I would like to totally blame Blogger, but really I should have written the script to expect possible order changes. It's really not difficult. I just didn't think of it at the time. Meh. Freakin' frackin' meh.

So for anyone wondering what to do - I guess I'm going to have to make the changes for those scripts to work correctly. Then tell you how best to update it on your side. I'm not sure if it will be a few hours, or a few days before I can get around to it. But I will and give an update.

Also seemingly unrelated, but still broken, anyone using the Automatic List of Labels is broken also, but I can't do anything about it. Blogger is returning an error when the feed is requested - if you try the feed url that you made for the script to use it's returning an error code bX-hdm1cy. Such as http://www.blogger.com/feeds/02129299419606882574/blogs/20460175. As long as that feed is still giving an error, it's Blogger's issue and I can't do a thing about it.

I hate the intertubes.

Scheduled Posts

Testing out Blogger in Drafts scheduled post feature. If you are reading this, then it must have worked.