计算机外文翻译---万维网爬行的有效URL缓存(节选)
《计算机外文翻译---万维网爬行的有效URL缓存(节选)》由会员分享,可在线阅读,更多相关《计算机外文翻译---万维网爬行的有效URL缓存(节选)(21页珍藏版)》请在毕设资料网上搜索。
1、4300单词, 2.1 万英文字符, 5500 汉字 出处: Broder A Z, Najork M, Wiener J L. Efficient URL caching for world wide web crawlingC/ 2003:679-689. Efficient URL caching for world wide web crawling Andrei Z. Broder, Marc Najork, Janet L. Wiener ABSTRACT Crawling the web is deceptively simple: the basic algorithm is
2、(a)Fetch a page (b) Parse it to extract all linked URLs (c) For all the URLs not seen before, repeat (a)(c). However, the size of the web (estimated at over 4 billion pages) and its rate of change (estimated at 7% per week) move this plan from a trivial programming exercise to a serious algorithmic
3、and system design challenge. Indeed, these two factors alone imply that for a reasonably fresh and complete crawl of the web, step (a) must be executed about a thousand times per second, and thus the membership test (c) must be done well over ten thousand times per second against a set too large to
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中设计图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 计算机 外文 翻译 万维网 爬行 有效 url 缓存 节选
