Google does a frighteningly good job of finding all the stuff you add to your site, but even it will miss some of the darker corners of your online properties. The best way to ensure that the pages you WANT to have crawled are indeed crawled is via a Google Sitemap.
There are lots of solutions to this – you can roll your own, use external crawlers, and install standalone scripts that will do this for you. If you’re using the Drupal platform, there are a number of projects out there you can use. One of the easiest to set up is (logically) called XML Sitemap. Installation is a quick thing:
- Add it to your site by either downloading it and dropping it into your /sites/all/modules folder, or use Drush and simply issue: drush dl xmlsitemap
- Enable at least the following two submodules – XML Sitemap and XML Sitemap Node. This is done either via the Modules section in your browser, or via Drush at the commandline: drush en -y xmlsitemap xmlsitemap_node
- By default it will only include your site root. Now, go to Structure->Content Types and open any content types you want to include. Click the edit link, and navigate to the XML Sitemap section of that content type. Change Disabled to Enabled to include it, and give it a priority based on how often you will be updating it.
- Once completed, create your sitemap by visiting http://example.com/admin/config/search/xmlsitemap.
- Your sitemap will then be available at http://example.com/sitemap.xml
You can then submit it via Webmaster Tools, and can keep it up to date by resubmitting when updates are made by enabling the XML Sitemap Engines module (drush en xmlsitemap_engines). In the configuration for this module, you can choose to resubmit to Google and/or Bing when updates are made to any of your Sitemap content (you’ll want to make sure your site is already registered with them).