Google Webmaster Central Blog

Syndicate content
Official news on crawling and indexing sites for the Google index.
Updated: 7 hours 25 min ago

More Options for Google+ Badges

Thu, 2012-02-02 19:15
Webmaster Level: All

Update on February 2, 2012: The new Google+ badge is now out of preview and available to all users on all sites.

When we launched Google+ pages in November, we also released Google+ badges to promote your Google+ presence right on your site. Starting today in developer preview (and soon available to all your users), we're adding more options for integrating the Google+ badge into your website. You can configure a badge with a width that fits your site design and choose a version that works better on darker sites. You'll also see that Google+ badges now include the unified +1 and circle count that we added to Pages last month.


If you’re still considering whether to add a Google+ badge on your website, consider this: We recently looked at top sites using the badge and found that, on average, the badge accounted for an additional 38% of followers. When you add the badge visitors to your website can discover your Google+ page and connect in a variety of ways: they can follow your Google+ page, +1 your site, share your site with their circles, see which of their friends have +1’d your site, and click through to visit your Google+ page.

The Google+ Badge makes it easy for your fans to find and follow you on Google+. With these additional options, we hope it's even easier to create a badge that fits your website.

Follow the conversation on Google+.

Posted by Lucy Hadden, Software Engineer, Google+
Categories: sysadmin

GET, POST, and safely surfacing more of the web

Tue, 2012-01-31 01:50
Webmaster Level: Intermediate to Advanced

As the web evolves, Google’s crawling and indexing capabilities also need to progress. We improved our indexing of Flash, built a more robust infrastructure called Caffeine, and we even started crawling forms where it makes sense. Now, especially with the growing popularity of JavaScript and, with it, AJAX, we’re finding more web pages requiring POST requests -- either for the entire content of the page or because the pages are missing information and/or look completely broken without the resources returned from POST. For Google Search this is less than ideal, because when we’re not properly discovering and indexing content, searchers may not have access to the most comprehensive and relevant results.

We generally advise to use GET for fetching resources a page needs, and this is by far our preferred method of crawling. We’ve started experiments to rewrite POST requests to GET, and while this remains a valid strategy in some cases, often the contents returned by a web server for GET vs. POST are completely different. Additionally, there are legitimate reasons to use POST (e.g., you can attach more data to a POST request than a GET). So, while GET requests remain far more common, to surface more content on the web, Googlebot may now perform POST requests when we believe it’s safe and appropriate.

We take precautions to avoid performing any task on a site that could result in executing an unintended user action. Our POSTs are primarily for crawling resources that a page requests automatically, mimicking what a typical user would see when they open the URL in their browser. This will evolve over time as we find better heuristics, but that’s our current approach.

Let’s run through a few POSTs request scenarios that demonstrate how we’re improving our crawling and indexing to evolve with the web.

Examples of Googlebot’s POST requests
  • Crawling a page via a POST redirect
    <html>
      <body onload="document.foo.submit();">
        <form name="foo" action="request.php" method="post">       <input type="hidden" name="bar" value="234"/>
        </form>
      </body>
    </html>
  • Crawling a resource via a POST XMLHttpRequest
    In this step-by-step example, we improve both the indexing of a page and its Instant Preview by following the automatic XMLHttpRequest generated as the page renders.

    1. Google crawls the URL, yummy-sundae.html.
    2. Google begins indexing yummy-sundae.html and, as a part of this process, decides to attempt to render the page to better understand its content and/or generate the Instant Preview.
    3. During the render, yummy-sundae.html automatically sends an XMLHttpRequest for a resource, hot-fudge-info.html, using the POST method.
      <html>
        <head>
          <title>Yummy Sundae</title>
          <script src="jquery.js"></script>
        </head>
        <body>
          This page is about a yummy sundae.
          <div id="content"></div>
          <script type="text/javascript">
            $(document).ready(function() {
              $.post('hot-fudge-info.html', function(data)
                {$('#content').html(data);});
            });
          </script>
        </body>
      </html>
    4. The URL requested through POST, hot-fudge-info.html, along with its data payload, is added to Googlebot’s crawl queue.
    5. Googlebot performs a POST request to crawl hot-fudge-info.html.
    6. Google now has an accurate representation of yummy-sundae.html for Instant Previews. In certain cases, we may also incorporate the contents of hot-fudge-info.html into yummy-sundae.html.
    7. Google completes the indexing of yummy-sundae.html.
    8. User searches for [hot fudge sundae].
    9. Google’s algorithms can now better determine how yummy-sundae.html is relevant for this query, and we can properly display a snapshot of the page for Instant Previews.
Improving your site’s crawlability and indexability

General advice for creating crawlable sites is found in our Help Center. For webmasters who want to help Google crawl and index their content and/or generate the Instant Preview, here are a few simple reminders:
  • Prefer GET for fetching resources, unless there’s a specific reason to use POST.
  • Verify that we're allowed to crawl the resources needed to render your page. In the example above, if hot-fudge-info.html is disallowed by robots.txt, Googlebot won't fetch it. More subtly, if the JavaScript code that issues the XMLHttpRequest is located in an external .js file disallowed by robots.txt, we won't see the connection between yummy-sundae.html and hot-fudge-info.html, so even if the latter is not disallowed itself, that may not help us much. We've seen even more complicated chains of dependencies in the wild. To help Google better understand your site it's almost always better to allow Googlebot to crawl all resources.

    You can test whether resources are blocked through Webmaster Tools “Labs -> Instant Previews.”
  • Make sure to return the same content to Googlebot as is returned to users’ web browsers. Cloaking (sending different content to Googlebot than to users) is a violation of our Webmaster Guidelines because, among other things, it may cause us to provide a searcher with an irrelevant result -- the content the user views in their browser may be a complete mismatch from what we crawled and indexed. We’ve seen numerous POST-request examples where a webmaster non-maliciously cloaked (which is still a violation), and their cloaking -- on even the smallest of changes -- then caused JavaScript errors that prevented accurate indexing and completely defeated their reason for cloaking in the first place. Summarizing, if you want your site to be search-friendly, cloaking is an all-around sticky situation that’s best to avoid.

    To verify that you're not accidentally cloaking, you can use Instant Previews within Webmaster Tools, or try setting the User-Agent string in your browser to something like:

    Mozilla/5.0 (compatible; Googlebot/2.1;
      +http://www.google.com/bot.html)
    Your site shouldn't look any different after such a change. If you see a blank page, a JavaScript error, or if parts of the page are missing or different, that means that something's wrong.
  • Remember to include important content (i.e., the content you’d like indexed) as text, visible directly on the page and without requiring user-action to display. Most search engines are text-based and generally work best with text-based content. We’re always improving our ability to crawl and index content published in a variety of ways, but it remains a good practice to use text for important information.
Controlling your content

If you’d like to prevent content from being crawled or indexed for Google Web Search, traditional robots.txt directives remain the best method. To prevent the Instant Preview for your page(s), please see our Instant Previews FAQ which describes the “Google Web Preview” User-Agent and the nosnippet meta tag.

Moving forward

We’ll continue striving to increase the comprehensiveness of our index so searchers can find more relevant information. And we expect our crawling and indexing capability to improve and evolve over time, just like the web itself. Please let us know if you have questions or concerns.

Written by , Software Engineer, Indexing Team, and , Developer Programs Tech Lead
Categories: sysadmin

What’s new with Sitemaps

Mon, 2012-01-30 14:48
Webmaster level: All

Sitemaps are a way to tell Google about pages on your site. Webmaster Tools’ Sitemaps feature gives you feedback on your submitted Sitemaps, such as how many Sitemap URLs have been indexed, or whether your Sitemaps have any errors. Recently, we’ve added even more information! Let’s check it out:


The Sitemaps page displays details based on content-type. Now statistics from Web, Videos, Images and News are featured prominently. This lets you see how many items of each type were submitted (if any), and for some content types, we also show how many items have been indexed. With these enhancements, the new Sitemaps page replaces the Video Sitemaps Labs feature, which will be retired.

Another improvement is the ability to test a Sitemap. Unlike an actual submission, testing does not submit your Sitemap to Google as it only checks it for errors. Testing requires a live fetch by Googlebot and usually takes a few seconds to complete. Note that the initial testing is not exhaustive and may not detect all issues; for example, errors that can only be identified once the URLs are downloaded are not be caught by the test.

In addition to on-the-spot testing, we’ve got a new way of displaying errors which better exposes what types of issues a Sitemap contains. Instead of repeating the same kind of error many times for one Sitemap, errors and warnings are now grouped, and a few examples are given. Likewise, for Sitemap index files, we’ve aggregated errors and warnings from the child Sitemaps that the Sitemap index encloses. No longer will you need to click through each child Sitemap one by one.

Finally, we’ve changed the way the “Delete” button works. Now, it removes the Sitemap from Webmaster Tools, both from your account and the accounts of the other owners of the site. Be aware that a Sitemap may still be read or processed by Google even if you delete it from Webmaster Tools. For example if you reference a Sitemap in your robots.txt file search engines may still attempt to process the Sitemap. To truly prevent a Sitemap from being processed, remove the file from your server or block it via robots.txt.

For more information on Sitemaps in Webmaster Tools and how Sitemaps work, visit our Help Center. If you have any questions, go to Webmaster Help Forum.

Written by Kamila Primke, Software Engineer, Webmaster Tools
Categories: sysadmin

Update to Top Search Queries data

Wed, 2012-01-25 17:00
Webmaster level: All

Starting today, we’re updating our Top Search Queries feature to make it better match expectations about search engine rankings. Previously we reported the average position of all URLs from your site for a given query. As of today, we’ll instead average only the top position that a URL from your site appeared in.

An example
Let’s say Nick searched for [bacon] and URLs from your site appeared in positions 3, 6, and 12. Jane also searched for [bacon] and URLs from your site appeared in positions 5 and 9. Previously, we would have averaged all these positions together and shown an Average Position of 7. Going forward, we’ll only average the highest position your site appeared in for each search (3 for Nick’s search and 5 for Jane’s search), for an Average Position of 4.

We anticipate that this new method of calculation will more accurately match your expectations about how a link's position in Google Search results should be reported.

How will this affect my Top Search Queries data?
This change will affect your Top Search Queries data going forward. Historical data will not change. Note that the change in calculation means that the Average Position metric will usually stay the same or decrease, as we will no longer be averaging in lower-ranking URLs.

Check out the updated Top Search Queries data in the Your site on the web section of Webmaster Tools. And remember, you can also download Top Search Queries data programmatically!

We look forward to providing you a more representative picture of your Google Search data. Let us know what you think in our Webmaster Forum.

Posted by , Google Analytics team, and , Webmaster Trends Analyst
Categories: sysadmin

Making form-filling faster, easier and smarter

Wed, 2012-01-25 13:00
Webmaster Level: Intermediate

One of the biggest bottlenecks on any conversion funnel is filling out an online form – shopping and registration flows all rely on forms as a crucial and demanding step in accomplishing the goals of your site. For many users, online forms mean repeatedly typing common information like our names and addresses on different sites across the web – a tedious task that causes many to give up and abandon the flow entirely.

Chrome’s Autofill and other form-filling providers help to break down this barrier by remembering common profile information and pre-populating the form with those values. Unfortunately, up to now it has been difficult for webmasters to ensure that Chrome and other form-filling providers can parse their form correctly. Some standards exist; but they put onerous burdens on the implementation of the website, so they’re not used much in practice.

Today we’re pleased to announce support in Chrome for an experimental new “autocomplete type” attribute for form fields that allows web developers to unambiguously label text and select fields with common data types such as ‘full-name’ or ‘street-address’. With this attribute, web developers can drive conversions on their sites by marking their forms for auto-completion without changing the user interface or the backend.


Just add an attribute to the input element, for example an email address field might look like:

<input type=”text” name=”field1” x-autocompletetype=”email” />

We’ve been working on this design in collaboration with several other autofill vendors. Like any early stage proposal we expect this will change and evolve as the web standards community provides feedback, but we believe this will serve as a good starting point for the discussion on how to best support autofillable forms in the HTML5 spec. For now, this new attribute is implemented in Chrome as x-autocompletetype to indicate that this is still experimental and not yet a standard, similar to the webkitspeech attribute we released last summer.

For more information, you can read the full text of the proposed specification, ask questions on the Webmaster help forum, or you can share your feedback in the standardization discussion!

Posted by Ilya Sherman, Software Engineer
Categories: sysadmin

Page layout algorithm improvement

Thu, 2012-01-19 18:00
Webmaster Level: All

In our ongoing effort to help you find more high-quality websites in search results, today we’re launching an algorithmic change that looks at the layout of a webpage and the amount of content you see on the page once you click on a result.

As we’ve mentioned previously, we’ve heard complaints from users that if they click on a result and it’s difficult to find the actual content, they aren’t happy with the experience. Rather than scrolling down the page past a slew of ads, users want to see content right away. So sites that don’t have much content “above-the-fold” can be affected by this change. If you click on a website and the part of the website you see first either doesn’t have a lot of visible content above-the-fold or dedicates a large fraction of the site’s initial screen real estate to ads, that’s not a very good user experience. Such sites may not rank as highly going forward.

We understand that placing ads above-the-fold is quite common for many websites; these ads often perform well and help publishers monetize online content. This algorithmic change does not affect sites who place ads above-the-fold to a normal degree, but affects sites that go much further to load the top of the page with ads to an excessive degree or that make it hard to find the actual original content on the page. This new algorithmic improvement tends to impact sites where there is only a small amount of visible content above-the-fold or relevant content is persistently pushed down by large blocks of ads.

This algorithmic change noticeably affects less than 1% of searches globally. That means that in less than one in 100 searches, a typical user might notice a reordering of results on the search page. If you believe that your website has been affected by the page layout algorithm change, consider how your web pages use the area above-the-fold and whether the content on the page is obscured or otherwise hard for users to discern quickly. You can use our Browser Size tool, among many others, to see how your website would look under different screen resolutions.

If you decide to update your page layout, the page layout algorithm will automatically reflect the changes as we re-crawl and process enough pages from your site to assess the changes. How long that takes will depend on several factors, including the number of pages on your site and how efficiently Googlebot can crawl the content. On a typical website, it can take several weeks for Googlebot to crawl and process enough pages to reflect layout changes on the site.

Overall, our advice for publishers continues to be to focus on delivering the best possible user experience on your websites and not to focus on specific algorithm tweaks. This change is just one of the over 500 improvements we expect to roll out to search this year. As always, please post your feedback and questions in our Webmaster Help forum.

Posted by Matt Cutts, Distinguished Engineer
Categories: sysadmin

Better page titles in search results

Thu, 2012-01-12 06:59

Page titles are an important part of our search results: they’re the first line of each result and they’re the actual links our searchers click to reach websites. Our advice to webmasters has always been to write unique, descriptive page titles (and meta descriptions for the snippets) to describe to searchers what the page is about.

We use many signals to decide which title to show to users, primarily the <title> tag if the webmaster specified one. But for some pages, a single title might not be the best one to show for all queries, and so we have algorithms that generate alternative titles to make it easier for our users to recognize relevant pages. Our testing has shown that these alternative titles are generally more relevant to the query and can substantially improve the clickthrough rate to the result, helping both our searchers and webmasters. About half of the time, this is the reason we show an alternative title.

Other times, alternative titles are displayed for pages that have no title or a non-descriptive title specified by the webmaster in the HTML. For example, a title using simply the word "Home" is not really indicative of what the page is about. Another common issue we see is when a webmaster uses the same title on almost all of a website’s pages, sometimes exactly duplicating it and sometimes using only minor variations. Lastly, we also try to replace unnecessarily long or hard-to-read titles with more concise and descriptive alternatives.

For more information about how you can write better titles and meta descriptions, and to learn more about the signals we use to generate alternative titles, we've recently updated the Help Center article on this topic. Also, we try to notify webmasters when we discover titles that can be improved on their websites through the HTML Suggestions feature in Webmaster Tools; you can find this feature in the Diagnostics section of the menu on the left hand side.

As always, if you have any questions or feedback, please tell us in the Webmaster Help Forum.

Posted by , Webmaster Trends Analyst

Categories: sysadmin

Download search queries data using Python

Thu, 2011-12-22 14:09
Webmaster level: Advanced

For all the developers who have expressed interest in getting programmatic access to the search queries data for their sites in Webmaster Tools, we've got some good news. You can now get access to your search queries data in CSV format using a open source Python script from the webmaster-tools-downloads project. Search queries data is not currently available via the Webmaster Tools API, which has been a common API user request that we're considering for the next API update. For those of you who need access to search queries data right now, let's look at an example of how the search queries downloader Python script can be used to download your search queries data and upload it to a Google Spreadsheet in Google Docs.

Example usage of the search queries downloader Python script
1) If Python is not already installed on your machine, download and install Python.
2) Download and install the Google Data APIs Python Client Library.
3) Create a folder and add the downloader.py script to the newly created folder.
4) Copy the example-create-spreadsheet.py script to the same folder as downloader.py and edit it to replace the example values for “website,” “email” and “password” with valid values for your Webmaster Tools verified site.
5) Open a Terminal window and run the example-create-spreadsheet.py script by entering "python example-create-spreadsheet.py" at the Terminal window command line:
python example-create-spreadsheet.py6) Visit Google Docs to see a new spreadsheet containing your search queries data.


If you just want to download your search queries data in a .csv file without uploading the data to a Google spreadsheet use example-simple-download.py instead of example-create-spreadsheet.py in the example above.

You could easily configure these scripts to be run daily or monthly to archive and view your search queries data across larger date ranges than the current one month of data that is available in Webmaster Tools, for example, by setting up a cron job or using Windows Task Scheduler.

An important point to note is that this script example includes user name and password credentials within the script itself. If you plan to run this in a production environment you should follow security best practices like using encrypted user credentials retrieved from a secure data storage source. The script itself uses HTTPS to communicate with the API to protect these credentials.

Take a look at the search queries downloader script and start using search queries data in your own scripts or tools. Let us know if you have questions or feedback in the Webmaster Help Forum.

Written by Jonathan Simon, Webmaster Trends Analyst
Categories: sysadmin

Website user research and testing on the cheap

Wed, 2011-12-21 14:24
Webmaster level: Intermediate

As the team responsible for tens of thousands of Google’s informational web pages, the Webmaster Team is here to offer tips and advice based on their experiences as hands-on webmasters.

If you’ve never tested or analyzed usage of your website, ask yourself if you really know whether your site is useful for your target audience. If you’re unsure, why not find out? For example, did you know that on average users scroll down 5.9 times as often as they scroll up, meaning that often once page content is scrolled past, it is “lost?” (See Jakob Nielsen’s findings on scrolling, where he advises that users don’t mind scrolling, but within limits.)

Also, check your analytics—are you curious about high bounce rates from any of your pages, or very short time-on-page metrics?

First, think about your user


The start of a web project—whether it’s completely new or a revamp of an existing site—is a great time to ask questions like:

  • How might users access your site—home, office, on-the-go?
  • How tech-savvy are your visitors?
  • How familiar are users with the subject matter of your website?

The answers to some of these questions can be valuable when making initial design decisions.

For instance, if the user is likely to be on the road, they might be short on time to find the information they need from your site, or be in a distracting environment and have a slow data connection—so a simple layout with single purpose would work best. Additionally, if you’re providing content for a less technical audience, make sure it’s not too difficult to access content—animation might provide a “wow” factor, but only if your user appreciates it and it’s not too difficult to get to the content.

Even without testing, building a basic user profile (or “persona”) can help shape your designs for the benefit of the user—this doesn’t have to be an exhaustive biography, but just some basic considerations of your user’s behavior patterns.

Simple testing


Testing doesn’t have to be a costly operation – friends and family can be a great resource. Some pointers:

  • Sample size: Just five people can be a large enough number of users to find common problems in your layouts and navigation (see Jakob Nielsen’s article on why using a small sample size is sufficient).
  • Choosing your testers: A range of different technical ability can be useful, but be sure to only focus on trends—for example, if more than 50% of your testers have the same usability issue, it’s likely a real problem—rather than individual issues encountered.
  • Testing location: If possible, visit the user in their home and watch how they use the site—observe how he/she normally navigates the web when relaxed and in their natural environment. Remote testing is also a possibility if you can’t make it in person—we’ve heard that Google+ hangouts can be used effectively for this (find out more about using Google+ hangouts).
  • How to test: Based on your site’s goals, define 4 or 5 simple tasks to do on your website, and let the user try to complete the tasks. Ask your testers to speak aloud so you can better understand their experiences and thought processes.
  • What to test: Basic prototypes in clickable image or document format (for example, PDF) or HTML can be used to test the basic interactions, without having to build out a full site for testing. This way, you can test out different options for navigation and layouts to see how they perform before implementing them.
  • What not to test: Focus on functionality rather than graphic design elements; viewpoints are often subjective. You would only get useful feedback on design from quantitative testing with large (200+) numbers of users (unless, for example, the colors you use on your site make the content unreadable, which would be good feedback!). One format for getting some useful feedback on the design can be to offer 5-6 descriptive keywords and ask your user to choose the most representative ones.
Overall, basic testing is most useful for seeing how your website’s functionality is working—the ease of finding information and common site interactions.

Lessons learned


In case you’re still wondering whether it’s really worth research and testing, here are a few simple things we confirmed from actual users that we wouldn’t have known if we hadn’t sat with actual users and watched them use our pages, or analyzed our web traffic.

  • Take care when using layouts that hide/show content: We found when using scripts to expand and collapse long text passages, the user often didn’t realize the extra content was available—effectively “hiding” the JavaScript-rendered content when the user searches within the page (for example, using Control + F, which we’ve seen often).


    Wireframe of layout tested, showing “zipped”
    content on the bottom left



    Final page design showing anchor links in the top
    and content laid out in the main body of the page


  • Check your language: Headings, link and button text are what catches the user’s eye the most when scanning the page. Avoid using “Learn more…” in link text—users seem averse to clicking on a link which implies they will need to learn something. Instead, just try to use a literal description of what content the user will get behind the link—and make sure link text makes sense and is easy to understand out of context, because that is often how it will be scanned. Be mindful about language and try to make button text descriptive, inviting and interesting.
  • Test pages on a slower connection: Try out your pages using different networks (for example, try browsing your website using the wifi at your local coffee shop or a friend’s house), especially if your target users are likely to be viewing your pages from a home connection that’s not as fast as your office network. We found a considerable improvement in CTR and time-on-site metrics in some cases when we made scripted animations much simpler and faster (hint: use Google’s Page Speed Online to check performance if you don’t have access to a slower Internet connection).
So if you’re caught up in a seemingly never-ending redevelopment cycle, save yourself some time in the future by investing a little up front through user profiling and basic testing, so that you’re more likely to choose the right approach for your site layout and architecture.

We’d love to hear from you in the comments: have you tried out website usability testing? If so, how did you get on, and what are your favorite simple and low-cost tricks to get the most out of it?
Categories: sysadmin

Rich Snippets Instructional Videos

Fri, 2011-12-16 12:59
Webmaster level: All

When users come to Google, they have a pretty good idea of what they’re looking for, but they need help deciding which result might have the information that best suits their needs. So, the challenge for Google is to make it very clear to our users what content exists on a page in both a useful and concise manner. That’s why we have rich snippets.


Essentially, rich snippets provide you with the ability to help Google highlight aspects of your page. Whether your site contains information about products, recipes, events or apps, a few simple additions to your markup can result in more engagement with your content -- and potentially more traffic to your site.

To help you get started or fine tune your rich snippets, we’ve put together a series of tutorial videos for webmasters of all experience levels. These videos provide guidance as you mark up your site so that Google is better able to understand your content. We can use that content to power the rich snippets we display for your pages. Check out the videos below to get started:



For more information on how to use rich snippets markup for your site, visit our Help Center.

Posted by , Product Manager
Categories: sysadmin

Introducing smartphone Googlebot-Mobile

Thu, 2011-12-15 10:20

Webmaster level: All

With the number of smartphone users rapidly rising, we’re seeing more and more websites providing content specifically designed to be browsed on smartphones. Today we are happy to announce that Googlebot-Mobile now crawls with a smartphone user-agent in addition to its previous feature phone user-agents. This is to increase our coverage of smartphone content and to provide a better search experience for smartphone users.

Here are the main user-agent strings that Googlebot-Mobile now uses:

  • Feature phones Googlebot-Mobile:

    • SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
    • DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
  • Smartphone Googlebot-Mobile:

    • Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)

The content crawled by smartphone Googlebot-Mobile will be used primarily to improve the user experience on mobile search. For example, the new crawler may discover content specifically optimized to be browsed on smartphones as well as smartphone-specific redirects.

One new feature we’re also launching that uses these signals is Skip Redirect for Smartphone-Optimized Pages. When we discover a URL in our search results that redirects smartphone users to another URL serving smartphone-optimized content, we change the link target shown in the search results to point directly to the final destination URL. This removes the extra latency the redirect introduces leading to a saving of 0.5-1 seconds on average when visiting landing page for such search results.

Since all Googlebot-Mobile user-agents identify themselves as a specific kind of mobile, please treat each Googlebot-Mobile request as you would a human user with the same phone user-agent. This, and other guidelines are described in our previous blog post and they still apply, except for those referring to smartphones which we are updating today. If your site has treated Googlebot-Mobile specially based on the fact that it only crawls with feature phone user-agents, we strongly recommend reviewing this policy and serving the appropriate content based on the Googlebot-Mobile’s user-agent, so that both your feature phone and smartphone content will be indexed properly.

If you have more questions, please ask on our Webmaster Help forums.

Posted by Yoshikiyo Kato, Software Engineer

Categories: sysadmin

Clicks and impressions for authors

Wed, 2011-12-14 18:37
Webmaster Level: All
(Cross-posted on the Inside Search Blog)

With the latest improvements to the way authorship annotations look in search and the addition of authorship to Google News, authors have been really excited about getting more visibility, and users benefit from seeing the name, photo, and way to connect with the person who created the content.

Authors have also been giving us a lot of feedback on what else they'd like to see, so today we're introducing “Author Stats” in Webmaster Tools that shows you how often your content is showing up on the Google search results page. If you associate your content with your Google Profile either via e-mail verification or a simple link, you can visit Webmaster Tools to see how many impressions and clicks your content got on the Google search results page. Check out what Matt Cutts would see for his content:

To see your information, go to google.com/webmasters and login with the same username you use for your Google+ Profile. On the left hand panel, you can see “Author Stats” under the “Labs” section. This is an experimental feature so we’re continuing to iterate and improve, but we wanted to get early feedback from you. You can e-mail us at authorship-pilot@google.com if you run into any issues or have feedback.

If you’re a content creator interested in learning more about authorship, check out our Help Center.

Posted by Javier Tordable, Software Engineer
Categories: sysadmin

Tips for hosting providers and webmasters

Tue, 2011-12-06 13:01

Webmaster level: All

Some webmasters on our forums ask about hosting-related issues affecting their sites. To help both hosting providers and webmasters recognize, diagnose, and fix such problems, we’d like to share with you some of the common problems we’ve seen and suggest how you can fix them.

  • Blocking of Googlebot crawling. This is a very common issue usually due to a misconfiguration in a firewall or DoS protection system and sometimes due to the content management system the site runs. Protection systems are an important part of good hosting and are often configured to block unusually high levels of server requests, sometimes automatically. Because, however, Googlebot often performs more requests than a human user, these protection systems may decide to block Googlebot and prevent it from crawling your website. To check for this kind of problem, use the Fetch as Googlebot function in Webmaster Tools, and check for other crawl errors shown in Webmaster Tools.

    We offer several tools to webmasters and hosting providers who want more control over Googlebot’s crawling, and to improve crawling efficiency:

    We have more information in our crawling and indexing FAQ.

  • Availability issues. A related type of problem we see is websites being unavailable when Googlebot (and users) attempt to access the site. This includes DNS issues, overloaded servers leading to timeouts and refused connections, misconfigured content distribution networks (CDNs), and many other kinds of errors. When Googlebot encounters such issues, we report them in Webmaster Tools as either URL unreachable errors or crawl errors.

  • Invalid SSL certificates. For SSL certificates to be valid for your website, they need to match the name of the site. Common problems include expired SSL certificates and servers misconfigured such that all websites on that server use the same certificate. Most web browsers will try warn users in these situations, and Google tries to alert webmasters of this issue by sending a message via Webmaster Tools. The fix for these problems is to make sure to use SSL certificates that are valid for all your website’s domains and subdomains your users will interact with.

  • Wildcard DNS. Websites can be configured to respond to all subdomain requests. For example, the website at example.com can be configured to respond to requests to foo.example.com, made-up-name.example.com and all other subdomains.

    In some cases this is desirable to have; for example, a user-generated content website may choose to give each account its own subdomain. However, in some cases, the webmaster may not wish to have this behavior as it may cause content to be duplicated unnecessarily across different hostnames and it may also affect Googlebot’s crawling.

    To minimize problems in wildcard DNS setups, either configure your website to not use them, or configure your server to not respond successfully to non-existent hostnames, either by refusing the connection or by returning an HTTP 404 header.

  • Misconfigured virtual hosting. The symptom of this problem is that multiple hosts and/or domain names hosted on the same server always return the contents of only one site. To rephrase, although the server hosts multiple sites, it returns only one site regardless of what is being requested. To diagnose the issue, you need to check that the server responds correctly to the Host HTTP header.

  • Content duplication through hosting-specific URLs. Many hosts helpfully offer URLs for your website for testing/development purposes. For example, if you’re hosting the website http://a.com/ on the hosting provider example.com, the host may offer access to your site through a URL like http://a.example.com/ or http://example.com/~a/. Our recommendation is to have these hosting-specific URLs not publicly accessible (by password protecting them); and even if these URLs are accessible, our algorithms usually pick the URL webmasters intend. If our algorithms select the hosting-specific URLs, you can influence our algorithms to pick your preferred URLs by implementing canonicalization techniques correctly.

  • Soft error pages. Some hosting providers show error pages using an HTTP 200 status code (meaning “Success”) instead of an HTTP error status code. For example, a “Page not found” error page could return HTTP 200 instead of 404, making it a soft 404 page; or a “Website temporarily unavailable” message might return a 200 instead of correctly returning a 503 HTTP status code. We try hard to detect soft error pages, but when our algorithms fail to detect a web host’s soft error pages, these pages may get indexed with the error content. This may cause ranking or cross-domain URL selection issues.

    It’s easy to check the status code returned: simply check the HTTP headers the server returns using any one of a number of tools, such as Fetch as Googlebot. If an error page is returning HTTP 200, change the configuration to return the correct HTTP error status code. Also, keep an eye out for soft 404 reports in Webmaster Tools, on the Crawl errors page in the Diagnostics section.

  • Content modification and frames. Webmasters may be surprised to see their page contents modified by hosting providers, typically by injecting scripts or images into the page. Web hosts may also serve your content by embedding it in other pages using frames or iframes. To check whether a web host is changing your content in unexpected ways, simply check the source code of the page as served by the host and compare it to the code you uploaded.

    Note that some server-side code modifications may be very useful. For example, a server using Google’s mod_pagespeed Apache module or other tools may be returning your code minified for page speed optimization.

  • Spam and malware. We’ve seen some web hosts and bulk subdomain services become major sources of malware and spam. We try hard to be granular in our actions when protecting our users and search quality, but if we see a very large fraction of sites on a specific web host that are spammy or are distributing malware, we may be forced to take action on the web host as a whole. To help you keep on top of malware, we offer:

We hope this list helps both hosting providers and webmasters diagnose and fix these issues. Beyond this list, also think about the qualitative aspects of hosting like quality of service and the helpfulness of support. As always, if you have questions or need more help, please ask in our Webmaster Help Forum.

Written by , Webmaster Trends Analyst

Categories: sysadmin

New markup for multilingual content

Mon, 2011-12-05 17:31
Many websites serve users from around the world. There are different approaches to serving content appropriate to your users' language and/or region. Last year, we launched support for explicit annotations for web pages rendering the same content with different language templates.
Today we're going further with our support for multilingual content with improved handling for these two scenarios:
  • Multiregional websites using substantially the same content. Example: English webpages for Australia, Canada and USA, differing only in price
  • Multiregional websites using fully translated content, or substantially different monolingual content targeting different regions. Example: a product webpage in German, English and French
Specifying language and location We've expanded our support of the rel="alternate" hreflang link element to handle content that is translated or provided for multiple geographic regions. The hreflang attribute can specify the language, optionally the country, and URLs of equivalent content. By specifying these alternate URLs, our goal is to be able to consolidate signals for these pages, and to serve the appropriate URL to users in search. Alternative URLs can be on the same site or on another domain.
Annotating pages as substantially similar content Optionally, for pages that have substantially the same content in the same language and are targeted at multiple countries, you may use the rel="canonical" link element to specify your preferred version. We’ll use that signal to focus on that version in search, while showing the local URLs to users where appropriate. For example, you could use this if you have the same product page in German, but want to target it separately to users searching on the Google properties for Germany, Austria, and Switzerland.
Example usage To explain how it works, let’s look at some example URLs:
  • http://www.example.com/ - contains the general homepage of a website, in Spanish
  • http://es-es.example.com/ - is the version for users in Spain, in Spanish
  • http://es-mx.example.com/ - is the version for users in Mexico, in Spanish
  • http://en.example.com/ - is the generic English language version
On all of these pages, we could use the following markup to specify language and optionally the region:
<link rel="alternate" hreflang="es" href="http://www.example.com/" /> <link rel="alternate" hreflang="es-ES" href="http://es-es.example.com/" /> <link rel="alternate" hreflang="es-MX" href="http://es-mx.example.com/" /> <link rel="alternate" hreflang="en" href="http://en.example.com/" /> If you specify a regional subtag, we’ll assume that you want to target that region.
Keep in mind that all of these annotations are to be used on a per-URL basis. You should take care to use the specific URL, not the homepage, for both of these link elements.
More help As always, if you need more help correctly implementing multiregional and multilingual websites, please see our Help Center article about this topic, or ask in our Webmaster Help Forum.
Written by , Software Engineer, Search Infrastructure, Google Switzerland
Categories: sysadmin

Grow your audience with Google+

Thu, 2011-11-10 03:19
Webmaster Level: All

At Google, we help grow your audience by connecting you with new users. We introduced the +1 button so your site would stand out on search and your users could easily share your content on Google+. But, sometimes you want to join the conversation and post content directly to where people are sharing.

Today we’re introducing Google+ for Business, a collection of tools and products that help you grow your audience. At the core of this is Google+ Pages, your site’s identity on Google+.

Google+ Pages: Have real conversations with the right people

To get your site on Google+, you first need to create a Google+ Page. On your page, you can engage in conversations with your visitors, direct readers back to your site for the latest updates, send tailored messages to specific groups of people, and see how many +1’s you have across the web. Google+ Pages will help you build relationships with your users, encouraging them to spend more time engaging with your content.

Google+ Pages are at the heart of Google+ for Business
Hangouts
Sometimes you might want to chat with your users face-to-face.  For example, if you run a food blog, you may want to invite a chef to talk about her favorite recipe, or if you manage a fashion review site, beauty specialists might want to hold how-to sessions with makeup tips. Hangouts make this easy, by letting you have high-quality video chats with nine people with a single click. You can use Hangouts to hold live forums, break news or simply get to know people better, all in real time.

Hangouts let you meet your customers, face-to-face
Circles
Circles allow you to group followers of your Page into smaller audiences. You can then share specific messages with specific groups. For example, you could create a Circle containing your most loyal readers and offer them exclusive content.
The Google+ badge: Grow your audience on Google+

To help your users find your page and start sharing, there are two buttons you can add to your site by visiting our Google+ badge configuration tool:

The Google+ icon, a small icon that directly links to your Page.
The Google+ badge, which we’re introducing in the coming days. This badge lets people add your page to their circles without leaving your site, and allows them to get updates from your site via Google+.

 

Extend the power of +1, stand out in Google search
You can also link your site to your Google+ page so that all your +1s -- from your Page, your website, and search results -- will get tallied together and appear as a single total. Potential visitors will be more likely to see the recommendations your site has received, whether they’re looking at a search result, your website, or your Page, meaning your +1’s will reach not only the 40 million users of Google+, but all the people who come to Google every day. You can link your site to your Page either using the Google+ badge or with a  piece of code. To set this up, visit our Google+ badge configuration tool.

Bringing Google+ to the rest of Google

Our ultimate vision for Google+ is to transform the overall Google experience -- weaving identity and sharing into all of our products. Beginning today, we’re rolling out a new experimental feature to a small group of eligible publishers, Google+ Direct Connect -- an easy way for your audience to find your Google+ Page on Google search.  If you’ve linked your Page to your site and you qualify, when someone searches for your website’s name with the ‘+’ sign before it Direct Connect will send them directly to your Page. For example, try searching for ‘+YouTube’ on Google. Users will also be prompted to automatically add Pages they find through Direct Connect to their circles. 

Direct Connect suggestions start populating as you type on Google.com
Just the beginning

We want to help you get your site on Google+ as soon as possible, so we’re opening the field trial for Google+ Pages to everyone today. Creating a Google+ Page only takes a few minutes. To get started, you’ll need a personal Google+ profile. If you don’t have a Google account, it’s very quick and easy to join. And if you’re looking for inspiration, check out some of the sites that are already starting to set up their Pages:


To learn more about how Google+ works for your site, check out the Google+ Your Business site. We’re just getting started, and have many more features planned for the coming weeks and months. To keep up to date on the latest news and tips, add the Google+ Your Business page to your circles. If you have ideas on how we can improve Google+ for your site, we’d love to hear them.

Posted by Dennis Troper, Product Management Director, Google+ Pages
Categories: sysadmin

Raising awareness of cross-domain URL selections

Tue, 2011-11-01 17:01

Webmaster level: Advanced

A piece of content can often be reached via several URLs, not all of which may be on the same domain. A common example we’ve talked about over the years is having the same content available on more than one URL, an issue known as duplicate content. When we discover a group of pages with duplicate content, Google uses algorithms to select one representative URL for that content. A group of pages may contain URLs from the same site or from different sites. When the representative URL is selected from a group with different sites the selection is called a cross-domain URL selection. To take a simple example, if the group of URLs contains one URL from a.com and one URL from b.com and our algorithms select the URL from b.com, the a.com URL may no longer be shown in our search results and may see a drop in search-referred traffic.

Webmasters can greatly influence our algorithms’ selections using one of the currently supported mechanisms to indicate the preferred URL, for example using rel="canonical" elements or 301 redirects. In most cases, the decisions our algorithms make in this regard correctly reflect the webmaster’s intent. However, in some rare cases we’ve also found many webmasters are confused as to why it has happened and what they can do if they believe the selection is incorrect.

To be transparent about cross-domain URL selection decisions, we’re launching new Webmaster Tools messages that will attempt to notify webmasters when our algorithms select an external URL instead of one from their website. The details about how these messages work are in our Help Center article about the topic, and in this blog post we’ll discuss the different scenarios in which you may see a cross-domain URL selection and what you can do to fix any selections you believe are incorrect.

Common causes of cross-domain URL selection

There are many scenarios that can lead our algorithms to select URLs across domains.

In most cases, our algorithms select a URL based on signals that the webmaster implemented to influence the decision. For example, a webmaster following our guidelines and best practices for moving websites is effectively signalling that the URLs on their new website are the ones they prefer for Google to select. If you’re moving your website and see these new messages in Webmaster Tools, you can take that as confirmation that our algorithms have noticed.

However, we regularly see webmasters ask questions when our algorithms select a URL they did not want selected. When your website is involved in a cross-domain selection, and you believe the selection is incorrect (i.e. not your intention), there are several strategies to improve the situation. Here are some of the common causes of unexpected cross-domain URL selections that we’ve seen, and how to fix them:

  1. Duplicate content, including multi-regional websites: We regularly see webmasters use substantially the same content in the same language on multiple domains, sometimes inadvertently and sometimes to geotarget the content. For example, it’s common to see a webmaster set up the same English language website on both example.com and example.net, or a German language website hosted on a.de, a.at, and a.ch.

    Depending on your website and your users, you can use one of the currently-supported canonicalization techniques to signal to our algorithms which URLs you wish selected. Please see the following articles about this topic:

  2. Configuration mistakes: Certain types of misconfigurations can lead our algorithms to make an incorrect decision. Examples of misconfiguration scenarios include:
    1. Incorrect canonicalization: Incorrect usage of canonicalization techniques pointing to URLs on an external website can lead our algorithms to select the external URLs to show in our search results. We’ve seen this happen with misconfigured content management systems (CMS) or CMS plugins installed by the webmaster.

      To fix this kind of situation, find how your website is incorrectly indicating the canonical URL preference (e.g. through incorrect usage of a rel="canonical" element or a 301 redirect) and fix that.

    2. Misconfigured servers: Sometimes we see hosting misconfigurations where content from site a.com is returned for URLs on b.com. A similar case occurs when two unrelated web servers return identical soft 404 pages that we may fail to detect as error pages. In both situations we may assume the same content is being returned from two different sites and our algorithms may incorrectly select the a.com URL as the canonical of the b.com URL.

      You will need to investigate which part of your website’s serving infrastructure is misconfigured. For example, your server may be returning HTTP 200 (success) status codes for error pages, or your server might be confusing requests across different domains hosted on it. Once you find the root cause of the issue, work with your server admins to correct the configuration.

  3. Malicious website attacks: Some attacks on websites introduce code that can cause undesired canonicalization. For example, the malicious code might cause the website to return an HTTP 301 redirect or insert a cross-domain rel="canonical" link element into the HTML <head> or HTTP header, usually pointing to an external URL hosting malicious content. In these cases our algorithms may select the malicious or spammy URL instead of the URL on the compromised website.

    In this situation, please follow our guidance on cleaning your site and submit a reconsideration request when done. To identify cloaked attacks, you can use the Fetch as Googlebot function in Webmaster Tools to see your page’s content as Googlebot sees it.

In rare situations, our algorithms may select a URL from an external site that is hosting your content without your permission. If you believe that another site is duplicating your content in violation of copyright law, you may contact the site’s host to request removal. In addition, you can request that Google remove the infringing page from our search results by filing a request under the Digital Millennium Copyright Act.

And as always, if you need help in identifying the cause of an incorrect decision or how to fix it, you can see our Help Center article about this topic and ask in our Webmaster Help Forum.

Posted by , Webmaster Trends Analyst

Categories: sysadmin

Create and manage Custom Search Engines from within Webmaster Tools

Tue, 2011-10-18 18:27
Webmaster level: All

Custom Search Engines (CSEs) enable you to create Google-powered customized search experiences for your sites. You can search over one or more sites, customize the look and feel to match your site, and even make money with AdSense for Search. Now it’s even easier to get started directly from Webmaster Tools.

If you’ve never created a CSE, just click on the “Custom Search” link in the Labs section and we’ll automatically create a default CSE that searches just your site. You can do some basic configuring or immediately get the code snippet to add your new CSE to your site. You can always continue on to the full CSE control panel for more advanced settings.

Once you’ve created your CSE (or if you already had one), clicking the “Custom Search” link in Labs will allow you to manage your CSEs without leaving Webmaster Tools.

We hope these new features make it easier for you to help users search your site. If you have any questions, please post them in our Webmaster Help Forum or the Custom Search Help Forum.

Posted by Sharon Xiao, Software Engineering Intern, and Ying Huang, Software Engineer
Categories: sysadmin

Accessing search query data for your sites

Tue, 2011-10-18 18:27
Webmaster level: All

SSL encryption on the web has been growing by leaps and bounds. As part of our commitment to provide a more secure online experience, today we announced that SSL Search on https://www.google.com will become the default experience for signed in users on google.com. This change will be rolling out over the next few weeks.

What is the impact of this change for webmasters? Today, a web site accessed through organic search results on http://www.google.com (non-SSL) can see both that the user came from google.com and their search query. (Technically speaking, the user’s browser passes this information via the HTTP referrer field.) However, for organic search results on SSL search, a web site will only know that the user came from google.com.

Webmasters can still access a wealth of search query data for their sites via Webmaster Tools. For sites which have been added and verified in Webmaster Tools, webmasters can do the following:
  • View the top 1000 daily search queries and top 1000 daily landing pages for the past 30 days.
  • View the impressions, clicks, clickthrough rate (CTR), and average position in search results for each query, and compare this to the previous 30 day period.
  • Download this data in CSV format.
In addition, users of Google Analytics’ Search Engine Optimization reports have access to the same search query data available in Webmaster Tools and can take advantage of its rich reporting capabilities.

We will continue to look into further improvements to how search query data is surfaced through Webmaster Tools. If you have questions, feedback or suggestions, please let us know through the Webmaster Tools Help Forum.

Posted by Anthony Chavez, Product Manager
Categories: sysadmin

View-all in search results

Tue, 2011-10-18 11:11
Webmaster level: Intermediate to Advanced

User testing has taught us that searchers much prefer the view-all, single-page version of content over a component page containing only a portion of the same information with arbitrary page breaks (which cause the user to click “next” and load another URL).


Searchers often prefer the view-all vs. paginated content with arbitrary page breaks and worse latency.
Therefore, to improve the user experience, when we detect that a content series (e.g. page-1.html, page-2.html, etc.) also contains a single-page version (e.g. page-all.html), we’re now making a larger effort to return the single-page version in search results. If your site has a view-all option, there’s nothing you need to do; we’ll work to do it on your behalf. Also, indexing properties, like links, will be consolidated from the component pages in the series to the view-all page.

However, high latency can make the view-all less preferred

Interestingly, the cases when users didn’t prefer the view-all page were correlated with high latency (e.g., when the view-all page took a while to load, say, because it contained many images). This makes sense because we know users are less satisfied with slow results. So while a view-all page is commonly desired, as a webmaster it’s important to balance this preference with the page’s load time and overall user experience.

Best practices for a series of content
  1. If your site includes view-all pages

    We aim to detect the view-all version of your content and, if available, its associated component pages. There’s nothing more you need to do! However, if you’d like to make it more explicit to us, you can include rel=”canonical” from your component pages to your view-all to increase the likelihood that we detect your series of pages appropriately.


    rel=”canonical” can specify the superset of content (i.e. the view-all page, in this case page-all.html) from the same information in a series of URLs.
    Why does this work?

    In the diagram, page-2.html of a series may specify the canonical target as page-all.html because page-all.html is a superset of page-2.html's content. When a user searches for a query term and page-all.html is selected in search results, even if the query most related to page-2.html, we know the user will still see page-2.html’s relevant information within page-all.html.


    On the other hand, page-2.html shouldn’t designate page-1.html as the canonical because page-2.html’s content isn’t included on page-1.html. It’s possible that a user’s search query is relevant to content on page-2.html, but if page-2.html’s canonical is set to page-1.html, the user could then select page-1.html in search results and find herself in a position where she has to further navigate to a different page to arrive at the desired information. That’s a poor experience for the user, a suboptimal result from us, and it could also bring poorly targeted traffic to your site.


    However, if you strongly desire your view-all page not to appear in search results: 1) make sure the component pages in the series don’t include rel=”canonical” to the view-all page, and 2) mark the view-all page as “noindex” using any of the standard methods.
  2. If you’d like to surface individual, component pages (or there’s no view-all available)

    It may be the case that one or both of the situations below apply to your site:

    • The view-all page is undesirable as a search result (e.g., load time too high or too difficult for users to navigate).
    • Your users prefer the multi-page experience and to be directed to a component page in search results, rather than the view-all page.

    If so, you can use standard HTML rel=”next” and rel=”prev” elements to specify a relationship between the component pages in your series of content. If done correctly, Google will generally strive to:

    • Consolidate indexing properties, such as links, between the component pages/URLs.
    • Send users to the most relevant page/URL from the component pages. Typically, the most relevant page is the first page of your content, but our algorithms may point users to one of the component pages in the series.

It’s not uncommon for webmasters to incorrectly use rel=”canonical” from component pages to the first page of their series (e.g. page-2.html with rel=”canonical” to page-1.html). We recommend against this implementation because the component pages don’t actually contain duplicate content. Using rel=”next” and rel=”prev” is far more appropriate.

Summary

Because users generally prefer the view-all option in search results, we’re making more of an effort to properly detect and serve this version to searchers. If you have a series of content, there’s nothing more you need to do. If you’d like to hint more to Google how best to serve users your information:
  1. To better optimize your view-all page, you can use rel=”canonical” from component pages to the single-page version; otherwise,
  2. If a view-all page doesn’t provide a good user experience for your site, you can use the rel=”next” and rel=”prev” attributes as a strong hint for Google to identify the series of pages and still surface a component page in results.

Questions?

As always, feel free to ask in our Webmaster Help Forum.

Written by Benjia Li & Joachim Kupke, Software Engineers, Indexing Team
Categories: sysadmin