Fixing Duplicate URLs: Collections Should Only Have Index Pages
Hey guys! Let's talk about something that can really mess with your website's SEO and how you organize your content: duplicate URLs. Specifically, we're going to dive into how collections in your app, like categories or tags, shouldn't generate those extra 'show' pages. Currently, they sometimes do, and it's causing a bit of a headache. I'll break down the problem, why it's a big deal, and how we can fix it. Get ready to level up your website's structure and SEO game!
The Problem: Duplicate URLs & Collections
Imagine you have a website with articles, and you've set up collections like "Long Reads" to group articles based on length. The current setup might create routes like these:
/long-reads/β This is the index, showing all your long articles. β/long-reads/my-article-titleβ This also shows the same article. β Duplicate!/articles/my-article-titleβ This is the primary, or canonical, URL for the article. β
See the problem? You have the same article accessible through two different URLs. This creates a few issues, which we will address later.
The Expected Behavior: Collections should only have an index page that lists the items. Think of it like a category or a tag page; it's a way to see a list of related content, not a page for the individual content itself. The individual content should only be accessible through its primary URL, the one generated by its post type (e.g., /articles/my-article-title).
Why This Matters: SEO, Clarity & User Experience
So, why is this a problem? Let's break it down:
- SEO Nightmare: Search engines aren't fans of duplicate content. They might get confused about which URL to rank, diluting the SEO juice that your articles deserve. This can lead to lower search rankings, which means fewer people finding your awesome content.
- Architectural Confusion: Think about your website's structure. Posts should have one home, one canonical URL. Collections are like filters or groupings, not the owners of the content. They shouldn't have their own individual pages for each piece of content.
- User Headaches: Which URL do you share? Which URL do you bookmark? Having multiple options leads to user confusion. Make things simple and clear for your visitors.
- Routing Overload: Having unnecessary routes complicates your website's routing logic. It makes it harder to manage and maintain your website's structure as it grows.
Fixing the Issue: The Code Changes
Alright, let's talk about how we can fix this. Here's a look at the code changes needed to ensure collections behave as they should.
1. Generator Update
When your system generates the files and structures for collections, we want to make sure it doesn't create a show view (like show.html.erb). The generator needs to be smart enough to differentiate between a collection and a post type. Post types get both index and show views. Collections only get the index view.
def generate_views(collection_name, format:)
collection_config = Bunko.configuration.find_collection(collection_name)
if collection_config
# Collections only get index view
generate_index_view(collection_name)
else
# PostTypes get both views
generate_index_view(collection_name)
generate_show_view(collection_name, format: format)
end
end
2. Routing Tweaks
The routing system is where the URLs are defined. We need to make sure that when a collection is set up, it only creates an index route. Post types still get both index and show routes.
def bunko_collection(collection_name, **options)
collection_config = Bunko.configuration.find_collection(collection_name.to_s)
if collection_config
# Collection: index only
resources resource_name, controller: controller, path: path_value, only: [:index]
else
# PostType: index + show
resources resource_name, controller: controller, path: path_value, only: [:index, :show], param: :slug
end
end
3. Collection Controller Logic
This is the controller for the collections. We need to ensure that if someone tries to access a collection through a show route (which shouldn't exist), they get a 404 error (Not Found). No show actions for collections, folks!
def load_post
@collection_name = bunko_collection_name
collection_config = Bunko.configuration.find_collection(@collection_name)
if collection_config
# Collections don't have show actions
render plain: "Posts can only be accessed through their PostType URL", status: :not_found
return
end
# ... existing PostType logic
end
4. View Template Adjustments
Your collection's index view (e.g., long_reads/index.html.erb) should link to the canonical URL of each post. This is the URL that belongs to the post's post type (e.g., /articles/my-article-title), not the collection URL. This reinforces the idea that the collection is a filter, and the post type is the content's home.
<% @posts.each do |post| %>
<%= link_to post.title, url_for([post.post_type.name.to_sym, post]) %>
<%# Generates: /articles/my-post-slug %>
<% end %>
Testing is Key
Making these changes is great, but how do we know they're working? We need tests! These tests will check that the routes are set up correctly and that collections behave as expected.
Test 1: Collection Routes
This test makes sure that collections only have an index route. No show routes allowed!
test "collection routes only to index, not show" do
@routes.draw do
bunko_collection :long_reads # This is a Collection
end
paths = @routes.routes.map { |r| r.path.spec.to_s }
assert_includes paths, "/long-reads(.:format)" # index β
refute_includes paths, "/long-reads/:slug(.:format)" # show β
end
Test 2: Post Type Routes
This test ensures that post types still have both index and show routes. This is the expected behavior for regular content.
test "post_type routes to both index and show" do
@routes.draw do
bunko_collection :articles # This is a PostType
end
paths = @routes.routes.map { |r| r.path.spec.to_s }
assert_includes paths, "/articles(.:format)" # index β
assert_includes paths, "/articles/:slug(.:format)" # show β
end
Test 3: Integration Test
This is a more comprehensive test that checks the overall behavior. It verifies that the collection's index page loads correctly and that the show route for a collection returns a 404.
test "collection index shows posts but show route returns 404" do
get "/long-reads"
assert_response :success
get "/long-reads/my-post-slug"
assert_response :not_found
end
test "post accessible via canonical PostType URL" do
get "/articles/my-post-slug"
assert_response :success
end
Edge Cases: Multi-Type Collections
What happens when your collection includes different content types? For example:
config.collection "resources", post_types: ["articles", "videos", "tutorials"]
The same rules apply! The /resources/ page should show an index of all three types, but individual items should still only be accessible via their canonical URLs:
/resources/β index of all content types. β/resources/my-article-slugβ 404. β/articles/my-article-slugβ Canonical URL. β/videos/my-video-slugβ Canonical URL. β
Benefits: A Recap
Let's summarize the benefits of making these changes:
- No Duplicate Content: Ensures each post has one, and only one, canonical URL.
- Clear Architecture: Separates collections (filters) from post types (content homes).
- Improved SEO: Avoids URL confusion for search engines, improving rankings.
- Cleaner Routing: Simplifies and reduces the number of routes to maintain.
- Intuitive Design: Aligns with how users and search engines expect content to be organized.
Acceptance Criteria: What Success Looks Like
To ensure we're on the right track, here's a checklist:
- β
bunko:add[collection_name]detects collections and generates only the index view. - β
bunko_collection :collection_namecreates only an index route for collections. - β The collection controller returns a 404 on show actions (or doesn't define them).
- β Post types still have both index and show routes/views.
- β Collection index views link to the canonical URLs for each post.
- β Tests verify that collections cannot be accessed via show routes.
- β Documentation is updated to explain the difference in routing between collections and post types.
Conclusion: Keep It Clean
That's it, guys! By implementing these changes, you'll create a cleaner, more SEO-friendly website structure. Remember, collections are for filtering and organizing, not for hosting individual pieces of content. This approach will make your site easier to manage, better for SEO, and provide a much better user experience. So, go forth, implement these changes, and watch your website thrive! Let me know in the comments if you have any questions!