Automatic Sitemap Improvements

Before I start I know you can manually create the sitemap, hence the request to improve automatic sitemaps :)

The problem:

We have a lot of pages "in progress" and a lot of designers working on our project, so at any given time there are many pages not fit for public consumption.

There are also certain types of pages you don't want to be indexed. e.g. paid media pages and conversion pages where it's necessary to exclude from the public domain and indexing.

Current workflow:

The current solution is to password protect the draft pages, but they are still included in our site map which presents a couple of problems.

  1. The sitemap makes our production roadmap public to whoever takes the time to check it out (competitors). We also can't use funky naming to mask our activities because designers will get confused and it'll just make for a crazy workflow.
  2. Can't exclude those paid media landing pages or conversion event pages
  3. If we exclude these pages from crawling in the robots.txt (which we do) we are sending a conflicting message to search engines and we get the message "Sitemap contains urls which are blocked by robots.txt." in search console.

Current solution:

The current solution it to manually create the sitemap, but who really wants to do that. Meh.

Dream solution - automagic sitemap:

The request here is to make the automatic sitemap smarter.

Options:

  • Page level
    • Set pages to draft that are not included in the sitemap (in page settings)
    • Set page to be excluded in sitemap (in page settings)
  • Folder level
    • Set folder to draft that are not included in the sitemap (in page settings)
    • Set folder to be excluded in sitemap (in page settings)
  • Site wide option to include/exclude password protection pages in sitemap/robots

There are possibly some further improvements, what come to mind are auto generating the robots.txt file from the resulting options, but I'll leave that to your genius team.

  • Nick Soper
  • May 17 2017
  • Collin Belt commented
    26 Oct, 2021 12:55pm

    Agreed, this would be incredibly helpful. In other platforms (such as Squarespace) there is a toggle to no-index the page and hide it from the automatically generated sitemap. Having a similar option would save us a lot of time and be much cleaner from an SEO perspective.

  • Raghu Kashyap commented
    13 Aug, 2021 10:08am

    I think the ability to define which subdomain to use as default is very important.


    The issue is reported in here https://discourse.webflow.com/t/sitemaps-invalid-w-multiple-custom-domains/47697


    We use reverse proxy and due to that we cannot make anything default and in webflow and this messes up sitemaps that gets generated.

  • Tiphaine Bruel commented
    23 Jul, 2020 03:28pm

    +1

  • Kjetil Grøsland commented
    8 Jul, 2020 11:42pm

    I'm a bit surprised to encounter this one as Webflow is so rich already on functionality.
    I wish for a simple switch button on Page settings, allowing us to hide page from sitemap and to set no-index.

    Same issue her. Getting error from Search console since I both added no-index to page, but it's still in sitemap.xml. Confusing google.

  • Chris Erickson commented
    16 Jun, 2020 05:10pm

    This would be such an SEO win. No need to waste crawl budget on pages we have no index tag on. Also might be some pages we want to hide from public view (like landing pages for ads, etc). Exposing all pages in a sitemap makes all of this more difficult than it needs to be.

  • Robert commented
    20 May, 2020 11:39am

    It would be great to exclude folders from the sitemap :)

  • Justin commented
    15 May, 2020 02:07pm

    We need the ability on each page, cms template page, and cms child pages to check a checkbox that adds a no index meta tag. This would also remove that page from the auto sitemap since Google Search console marks a page as an error if you add no index code but it shows on the sitemap. Also, the auto sitemap needs to include all pages including cms pages. All of this is important to technical SEO.

  • Aditya Lakhe commented
    10 Jan, 2020 11:07am

    +1

  • Juan Manuel Garrido commented
    30 Sep, 2019 08:30pm

    +1

  • Axel Sturmann commented
    16 Aug, 2019 01:32pm

    This would be very useful! 
    And a page level option (preferably with check boxes) to auto add the following to each page:
    Noindex, Nofollow, Noarchive, Nosnippet
    Many thanks,

  • Austin Hellman commented
    19 Jul, 2019 12:43pm

    This is an essential feature for SEO. I am getting Google Search Console errors for using my robots.txt file or using a noindex tag with pages that are submitted in the automatically generated sitemap. There seems to be no way to get around this unless I upload the sitemap manually which is not user-friendly or realistic.

  • Brandon Urich commented
    16 Feb, 2019 10:23pm

    Yes! I'm adding No Index, No Follow in the head on certain pages, but then you get a GSC error because the pages still show up in the submitted site map. A control to allow "Exclude from Sitemap" is highly needed in order to adhere to best practices. This is very much needed for collection page templates since in some cases the individual collection item page doesn't ever need to be visited.

  • Christoffer Furnes commented
    12 Feb, 2018 12:28pm

    Sometimes I make collections specifically to be used as a multi-reference filtering functionality in other collections. The problem is that this collection also get a public collection url that will be indexed by Google if the Auto-generate Sitemap function in Webflow is used.

    Excluding the collection I do not want listed in a manual sitemap is a solution, but then it has to be updated every time the site structure is changes in the future.

    What if there could be an option in the collection settings to exclude it from the automatic site map list?

    Not a big problem, but a function for excluding a collection/page from the auto generated sitemap would ensure an always updated sitemap, but exclude the things you do not want to show.

  • Diarmuid Sexton commented
    9 Feb, 2018 09:25am

    This feature would be great and is much needed! Clients seeing lots of errors on Search Console due to <meta name="robots" content="noindex"> that I've embedded on collection pages which are just for reference. If these pages were hidden from the sitemap, there would be no errors.

  • Evan McDaniel commented
    5 Feb, 2018 04:36pm

    I'd like to see the ability to indicate which domain should be used for the sitemap file. We use 4 domains per site on average (internal version, client facing non-public version, www live version, non-www live version that redirects to www), and it's a mystery as to which gets used in the sitemap, and there is currently no way to change it.

    This makes the auto-generation of the sitemap a no-go for us. Manual updating is tedious, so a solution for this would be a big improvement for us.

     

    Thanks!

  • Cameron Roe commented
    9 Aug, 2017 03:04am

    I also think it would be amazing to have a sitemap diagramming tool built into Webflow. For example, you could go to a separate section under the Assets tab and it would show a visual sitemap as a diagram of all the pages on the site. The view would overlap the designer section and you could drag your page structure around in a more sophisticated manner. What would be nice is to see all the routes (with dynamic parameters from CMS) and update them by simply re-ordering or renaming the route structure.

    For example:

    => /

      => /about

      => /contact

      => /blog

      => /blog/:published_on_year/:published_on_month/:published_on_day/:post_title

      => /events/:event_title

      => /projects/:category/:project_title

  • Noelle Greenwood commented
    31 Jul, 2017 05:21am

    Yes! This! Webflow masters - please can you advise when this might be considered? The conflicting messages for google is not great SEO... :(

  • +123