| ||||
| ||||
| You Are Here: | Home > Internet & Web > Web Publishing > Webmaster's Toolkit > Virginia Tech Search and Google |
|---|
Virginia Tech Search and GoogleVirginia Tech is now using Google as its primary search engine. Some of the flexibility in maintaining the Virginia Tech site index was lost; however, the methods Google uses to return results and ranking are well worth it. Google rates relevancy based on how many sites are linked to your site. The more sites linking to your site, the higher up on the "hit list" your site will be. Metadata is still as important as it was with the previous UltraSeek server (Google uses the <meta name="description"> tag when building indexes and in result set listings): see http://computing.vt.edu/internet_and_web/web_publishing/webmasters_toolkit/promotion/promotion.html) for the 7 steps to promote your web site. You do have control over whether your entire site, individual pages or images are indexed, as well as whether your pages are cached or not. The following information details the methods for controlling what is displayed in the new Virginia Tech search. Contents
Adding the Virginia Tech search to your Web siteIf you want to add the Virginia Tech search to your Web site, insert the following code into your Web site, replacing the 'inurl' value to your site (without the 'http://www'). Here is the code: (This example searches the computing.vt.edu Web site; note the parameter 'inurl:computing.vt.edu') Search Computing Web site for <input type="text" name="q" size="16" maxlength="300"> <input name="hq" type="hidden" value="inurl:computing.vt.edu"> <input type="submit" value=" Search "> </form> Here's a working example:
You can restrict the parameter 'inurl' to a subfolder of your
site. Try this code for example: It only returns hits from pages in that particular directory. Here's a working example of the subfolder:
Changing the URL of your Web site
If you use HTTP 301 (permanent) redirects, Google's crawler will know to use the new URL. Google will reflect changes like this in six (6) to eight (8) weeks.
Removing your Web siteIf you wish to exclude your entire Web site or a specific section (directory) of your server from Google's index, you can place a file at the root of your server called robots.txt. To prevent Google and other search engines from crawling your site, place the following 'robots.txt' file in your server root:
This is the standard protocol that most Web crawlers observe for excluding a Web server or directory from an index. More information on 'robots.txt' is available here: http://www.robotstxt.org/wc/norobots.html. Removing individual pagesIf you want to prevent all robots from indexing individual pages on your site, you can place the following meta tag element into the page's HTML code:
More information on this standard meta tag element is available here: http://www.robotstxt.org/wc/exclusion.html#meta. Removing snippetsA snippet is a text excerpt from the returned result page that has all the query terms bolded. This allows users to see the context that the search term appears in within your Web page. Imbedding the following Note: removing snippets also removes cached pages. More information on this standard meta tag element is available here: http://www.robotstxt.org/wc/exclusion.html#meta. Remove cached pagesGoogle keeps the text of the documents it crawls available in a cache. This allows a cached version of a Web page to be displayed if the original page is unavailable. The cached page appears exactly as it looked when Google spidered it. The following
If you want to allow other indexing robots to archive your page's content, preventing only Google's robots from caching the page, use the following tag:
Note: This tag only removes the cached link for the page the next time the site is crawled. Google continues to index the page and display a snippet. More information on this standard meta tag element is available here: http://www.robotstxt.org/wc/exclusion.html#meta.
Removing an outdated (dead) linkGoogle updates its entire index automatically on a regular basis. During this process (web crawling), they find new pages, discard dead links, and update links automatically. Dead links returning a '(404 - Not found)' error code, will fade out during Google's next crawl. Removing an image from Google's Image SearchIf you have an image that you do not want on Google's image index, add a 'robots.txt' file to the directory root of your server. For example, if you don't want your logo (mylogo.jpg) indexed, your 'robots.txt' file would include the following:
To remove all the images on your site from Google's index, place the following entry into your 'robots.txt' file:
This is the standard protocol that most Web crawlers observe for excluding a Web server or directory from an index. More information on 'robots.txt' is available at: http://www.robotstxt.org/wc/norobots.html. |
| ||||