cat << EOF > The importance of good RSS feed GUIDs
The past few days I've been trying to migrate my existing RSS feeds to the new site. The blog, linkblog and podcast all have distinct feeds, and they are currently hosted on seperate subdomains, which each resolve to a seperate server. The linkblog and podcast feeds are both created by the initial version of my static site generator, whereas the blog's feed is built using a static site generator called Jekyll.
When I wrote the initial implementation of the ssg tool, I just went with what seemed like the default settings of the feed generation library I was using. That library expects each feed item to have a post url, which makes sense because clearly consumers of the feed are highly likely to want to know where the post is located on the web. The library docs also state that you can provide a GUID for each feed item, but that it's optional, and if you don't provide one, then it will just use the post url as the GUID. That seems to make sense because urls are essentially unique, which is good because GUIDs need to be unique. So what's the problem?
Well the problem only really happens if you decide to move your website. Of course each post on your new website will at the very least have a new domain name, but if you change website software, then the structure of the urls could change too, depending on how the new software organises the site pages. With new feed item GUIDs, there's a chance readers of your feed will suddenly see duplicate items appear in their feed reader.
The reason for this is because many feed clients use the GUIDs as a way to keep track of items in the feed. If you change the GUIDs, then existing posts could very well end up looking like totally new feed items, and be displayed as such, right next to the ones with the original GUIDs. Readers will unnexpectedly be faced with doppleganger feed items! It's freaky enough in real life, but in an RSS reader? It's almost too scary to contemplate.
I wish I would have been aware of this when I initially set the feeds up. All the feed items have a uuid which I could easily have used. Now I'm faced trying to figure out a way to keep old GUIDs the same. There's a relatively easy fix for the podcast. Since there aren't that many posts I could easily go through and set the uiid to the post url. That way they will remain unchanged in the new location. The podcasts data will look a bit strange, but at least things will look good in Apple Podcasts. An other way to approach the problem is to add some special cases to the feed generation code, which I could do, but I really don't want this sort of special case brittleness in the codebase.
Changing the data for the linkblog isn't really an option because there is in the order of 170000 posts. I'm not sure what I'll do there. I haven't checked what Jekyll uses for GUIDs. I'm guessing it's post urls.
Does it even matter? I doubt there are many folks actually subscribed. And it might very well be that the duplicates issue only really arises for the latest items. After all the RSS feeds only actually contains the last 20 posts for the blog abd 50 posts for the linkblog.
That's pretty much what you need to know about RSS feed GUIDs. I'm still not sure what I'll do. I think I'll likely try to make it look good in Apple Podcasts, and just go ahead and change to using uuids for the blog and linkblog. If you know of any reason I should do it differently please email me.
EOF