|
Related contents
Information
|
|
|
Title: Search robots who they are
Keywords: Search robots Description: Search robots - are special programs that constantly scan the content of the Internet Body: Search robots who they areSearch robots - are special programs that constantly scan the content of the Internet. Here you need to make small but very important amendment - robots crawl text only, that is only web pages in the languages of html, htm, shtml, xml, etc. All other files (archives, graphics, music, video) robots do not touch. Most often, instead of the word robot, using the word search engine, or search engine, although it is not true. Search engines in a simplified form can be presented as a set of interrelated elements, which necessarily includes:1. Search robot 2. Database 3. The interface to work with users (Web site) In order not to confuse the readers of this list, I intentionally removed such elements as the handler queries, various additional services, which have each search engine. Why do we need robots? Internet - a huge network containing an enormous amount of information in which at least be enough, but you need to navigate, that is be able to find the right moment in time the necessary data. Just for this need and search engines. To search engine knew what address on the internet that is, he needs to preview all sites and add their own content in the database. This is precisely deals with the search robot. Then, when receiving the request, the search engine browses own database and gives the user the results at his request. It would seem that so much noise from the fact that the site will be program and to read it. But robots are browsing sites are not one and not two times, they do so consistently, because information on the network is constantly changing, some sites appear again, some is not working on some pages is changing, so the database must be constantly search engine to make data on all changes in the network. Otherwise, one month results, issued in response to requests will be obsolete, hence unsatisfactory. The more powerful computer, which installed robot, the greater the number of pages can be viewed in unit time (for example, in the hour or day). Such a view of a page called indexing. When the robot have reviewed all pages, they say that the site is indexed. But the huge number of Internet web pages, as the robot manages to bypass all? Robots are configured in such a way as to go to different sites at different intervals. If the site is updated very often, the robot visits him every day, or more frequently. If that has been going on the same site, the robot does not find it on any changes and additions, the frequency of visits to this site in the future will gradually decrease. As a result, indexed site can only take place once a month or less. As the robot focuses on the network? Moving from site to site by the links. When the robot once again viewing the site in search of updates, he sees it all links, some of them are already known to him (ie addresses of those sites already have in its database), and some he sees the first time. In the second case, the robot either immediately transferred to the new link, or captures her in a "job" and revert to it after a while. |
|
Copyright 1997-2008 Web business and programming All Rights Reserved.