Home | Services A–Z | Help & Tutorials
You Are Here:Home > Internet & Web > Web Publishing > Webmaster's Toolkit > Virginia Tech Search and Google

Virginia Tech Search and Google

Virginia Tech is now using Google as its primary search engine. Some of the flexibility in maintaining the Virginia Tech site index was lost; however, the methods Google uses to return results and ranking are well worth it. Google rates relevancy based on how many sites are linked to your site. The more sites linking to your site, the higher up on the "hit list" your site will be. Metadata is still as important as it was with the previous UltraSeek server (Google uses the <meta name="description"> tag when building indexes and in result set listings): see http://computing.vt.edu/internet_and_web/web_publishing/webmasters_toolkit/promotion/promotion.html) for the 7 steps to promote your web site.

You do have control over whether your entire site, individual pages or images are indexed, as well as whether your pages are cached or not. The following information details the methods for controlling what is displayed in the new Virginia Tech search.

Contents

Adding the Virginia Tech search to your Web site

If you want to add the Virginia Tech search to your Web site, insert the following code into your Web site, replacing the 'inurl' value to your site (without the 'http://www').

Here is the code: (This example searches the computing.vt.edu Web site; note the parameter 'inurl:computing.vt.edu')

<form method="get" action="http://www.google.com/u/virginiatech">
Search Computing Web site for <input type="text" name="q" size="16" maxlength="300">
<input name="hq" type="hidden" value="inurl:computing.vt.edu">
<input type="submit" value=" Search ">
</form>

Here's a working example:

Search Computing Web site for

You can restrict the parameter 'inurl' to a subfolder of your site. Try this code for example:

<input name="hq" type="hidden" value="inurl:computing.vt.edu/web_publishing/"> ...and search for Filebox

It only returns hits from pages in that particular directory.

Here's a working example of the subfolder:

Search Computing Web site for

 

Changing the URL of your Web site


If you change the URL for your Web site, the new URL will be indexed the next time Google crawls your site. The crawler revisits each site according to an automatic schedule. The date your site will be recrawled cannot be accelerated manually.

You should make sure that any sites currently linked to your old site update their links to point to your new site.

If you use HTTP 301 (permanent) redirects, Google's crawler will know to use the new URL. Google will reflect changes like this in six (6) to eight (8) weeks.

 

Removing your Web site

If you wish to exclude your entire Web site or a specific section (directory) of your server from Google's index, you can place a file at the root of your server called robots.txt.

To prevent Google and other search engines from crawling your site, place the following 'robots.txt' file in your server root:

User-Agent: *
Disallow: /

This is the standard protocol that most Web crawlers observe for excluding a Web server or directory from an index. More information on 'robots.txt' is available here: http://www.robotstxt.org/wc/norobots.html.


Removing individual pages

If you want to prevent all robots from indexing individual pages on your site, you can place the following meta tag element into the page's HTML code:

<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">

More information on this standard meta tag element is available here: http://www.robotstxt.org/wc/exclusion.html#meta.


Removing snippets

A snippet is a text excerpt from the returned result page that has all the query terms bolded. This allows users to see the context that the search term appears in within your Web page.

Imbedding the following Meta tag in your pages will prevent Google from displaying snippets for those pages:

<meta name="GOOGLEBOT" content="NOSNIPPET">

Note: removing snippets also removes cached pages.

More information on this standard meta tag element is available here: http://www.robotstxt.org/wc/exclusion.html#meta.


Remove cached pages

Google keeps the text of the documents it crawls available in a cache. This allows a cached version of a Web page to be displayed if the original page is unavailable. The cached page appears exactly as it looked when Google spidered it.

The following Meta tag will prevent all robots from archiving (caching) content on your site:

<meta name="ROBOTS" content="NOARCHIVE">

If you want to allow other indexing robots to archive your page's content, preventing only Google's robots from caching the page, use the following tag:

<meta name="GOOGLEBOT" content="NOARCHIVE">

Note: This tag only removes the cached link for the page the next time the site is crawled. Google continues to index the page and display a snippet.

More information on this standard meta tag element is available here: http://www.robotstxt.org/wc/exclusion.html#meta.

Removing an outdated (dead) link

Google updates its entire index automatically on a regular basis. During this process (web crawling), they find new pages, discard dead links, and update links automatically. Dead links returning a '(404 - Not found)' error code, will fade out during Google's next crawl.


Removing an image from Google's Image Search

If you have an image that you do not want on Google's image index, add a 'robots.txt' file to the directory root of your server. For example, if you don't want your logo (mylogo.jpg) indexed, your 'robots.txt' file would include the following:

User-Agent: Googlebot-Image
Disallow: /images/mylogo.jpg

To remove all the images on your site from Google's index, place the following entry into your 'robots.txt' file:

User-Agent: Googlebot-Image
Disallow: /

This is the standard protocol that most Web crawlers observe for excluding a Web server or directory from an index. More information on 'robots.txt' is available at: http://www.robotstxt.org/wc/norobots.html.

 

Last updated on May 13, 2002
Request Help | Site Feedback | Disclaimer | Privacy Statement