標題: 通過robots.txt屏蔽國際公認垃圾蜘蛛的訪問 [打印本頁]


通過robots.txt屏蔽國際公認垃圾蜘蛛的訪問


全球互聯網上有多少搜索引擎機器人(爬行蜘蛛)在工作?這個問題很難回答,你要知道有很多人開發了自己的機器人來竊取別人的信息,也有很多為了其他的利益而開發了機器人。這些機器人,都是垃圾機器人,不但占用了網站的帶寬,而且網站的用戶信息都可能已經被偷走了。這裏,我根據國外同行的一些小技巧介紹下如何給自己的添加robots.txt並進行設置,來徹底與垃圾搜索引擎機器人說Goodbye。

首先,打開記事本,復制下面這些代碼。代碼解釋:下面是126個國際上公認的垃圾搜索引擎機器人、蜘蛛、搜索代理等,通過disallow:/來完全禁止他們爬行網站。

  1. User-agent: larbin
  2. Disallow: /
  3. User-agent: b2w/0.1
  4. Disallow: /
  5. User-agent: Copernic
  6. Disallow: /
  7. User-agent: psbot
  8. Disallow: /
  9. User-agent: Python-urllib
  10. Disallow: /
  11. User-agent: URL_Spider_Pro
  12. Disallow: /
  13. User-agent: CherryPicker
  14. Disallow: /
  15. User-agent: EmailCollector
  16. Disallow: /
  17. User-agent: EmailSiphon
  18. Disallow: /
  19. User-agent: WebBandit
  20. Disallow: /
  21. User-agent: EmailWolf
  22. Disallow: /
  23. User-agent: ExtractorPro
  24. Disallow: /
  25. User-agent: CopyRightCheck
  26. Disallow: /
  27. User-agent: Crescent
  28. Disallow: /
  29. User-agent: SiteSnagger
  30. Disallow: /
  31. User-agent: ProWebWalker
  32. Disallow: /
  33. User-agent: CheeseBot
  34. Disallow: /
  35. User-agent: LNSpiderguy
  36. Disallow: /
  37. User-agent: Alexibot
  38. Disallow: /
  39. User-agent: Teleport
  40. Disallow: /
  41. User-agent: TeleportPro
  42. Disallow: /
  43. User-agent: MIIxpc
  44. Disallow: /
  45. User-agent: Telesoft
  46. Disallow: /
  47. User-agent: Website Quester
  48. Disallow: /
  49. User-agent: WebZip
  50. Disallow: /
  51. User-agent: moget/2.1
  52. Disallow: /
  53. User-agent: WebZip/4.0
  54. Disallow: /
  55. User-agent: WebStripper
  56. Disallow: /
  57. User-agent: WebSauger
  58. Disallow: /
  59. User-agent: WebCopier
  60. Disallow: /
  61. User-agent: NetAnts
  62. Disallow: /
  63. User-agent: Mister PiX
  64. Disallow: /
  65. User-agent: WebAuto
  66. Disallow: /
  67. User-agent: TheNomad
  68. Disallow: /
  69. User-agent: WWW-Collector-E
  70. Disallow: /
  71. User-agent: RMA
  72. Disallow: /
  73. User-agent: libWeb/clsHTTP
  74. Disallow: /
  75. User-agent: asterias
  76. Disallow: /
  77. User-agent: httplib
  78. Disallow: /
  79. User-agent: turingos
  80. Disallow: /
  81. User-agent: spanner
  82. Disallow: /
  83. User-agent: InfoNaviRobot
  84. Disallow: /
  85. User-agent: Harvest/1.5
  86. Disallow: /
  87. User-agent: Bullseye/1.0
  88. Disallow: /
  89. User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
  90. Disallow: /
  91. User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
  92. Disallow: /
  93. User-agent: CherryPickerSE/1.0
  94. Disallow: /
  95. User-agent: CherryPickerElite/1.0
  96. Disallow: /
  97. User-agent: WebBandit/3.50
  98. Disallow: /
  99. User-agent: NICErsPRO
  100. Disallow: /
  101. User-agent: Microsoft URL Control – 5.01.4511
  102. Disallow: /
  103. User-agent: Not Your Business!
  104. Disallow: /
  105. User-agent: Hidden-Referrer
  106. Disallow: /
  107. User-agent: DittoSpyder
  108. Disallow: /
  109. User-agent: Foobot
  110. Disallow: /
  111. User-agent: WebmasterWorldForumBot
  112. Disallow: /
  113. User-agent: SpankBot
  114. Disallow: /
  115. User-agent: BotALot
  116. Disallow: /
  117. User-agent: lwp-trivial/1.34
  118. Disallow: /
  119. User-agent: lwp-trivial
  120. Disallow: /
  121. User-agent: BunnySlippers
  122. Disallow: /
  123. User-agent: Microsoft URL Control – 6.00.8169
  124. Disallow: /
  125. User-agent: URLy Warning
  126. Disallow: /
  127. User-agent: Wget/1.6
  128. Disallow: /
  129. User-agent: Wget/1.5.3
  130. Disallow: /
  131. User-agent: Wget
  132. Disallow: /
  133. User-agent: LinkWalker
  134. Disallow: /
  135. User-agent: cosmos
  136. Disallow: /
  137. User-agent: moget
  138. Disallow: /
  139. User-agent: hloader
  140. Disallow: /
  141. User-agent: humanlinks
  142. Disallow: /
  143. User-agent: LinkextractorPro
  144. Disallow: /
  145. User-agent: Offline Explorer
  146. Disallow: /
  147. User-agent: Mata Hari
  148. Disallow: /
  149. User-agent: LexiBot
  150. Disallow: /
  151. User-agent: Web Image Collector
  152. Disallow: /
  153. User-agent: The Intraformant
  154. Disallow: /
  155. User-agent: True_Robot/1.0
  156. Disallow: /
  157. User-agent: True_Robot
  158. Disallow: /
  159. User-agent: BlowFish/1.0
  160. Disallow: /
  161. User-agent: JennyBot
  162. Disallow: /
  163. User-agent: MIIxpc/4.2
  164. Disallow: /
  165. User-agent: BuiltBotTough
  166. Disallow: /
  167. User-agent: ProPowerBot/2.14
  168. Disallow: /
  169. User-agent: BackDoorBot/1.0
  170. Disallow: /
  171. User-agent: toCrawl/UrlDispatcher
  172. Disallow: /
  173. User-agent: WebEnhancer
  174. Disallow: /
  175. User-agent: suzuran
  176. Disallow: /
  177. User-agent: TightTwatBot
  178. Disallow: /
  179. User-agent: VCI WebViewer VCI WebViewer Win32
  180. Disallow: /
  181. User-agent: VCI
  182. Disallow: /
  183. User-agent: Szukacz/1.4
  184. Disallow: /
  185. User-agent: QueryN Metasearch
  186. Disallow: /
  187. User-agent: Openfind data gathere
  188. Disallow: /
  189. User-agent: Openfind
  190. Disallow: /
  191. User-agent: Xenu's Link Sleuth 1.1c
  192. Disallow: /
  193. User-agent: Xenu's
  194. Disallow: /
  195. User-agent: Zeus
  196. Disallow: /
  197. User-agent: RepoMonkey Bait & Tackle/v1.01
  198. Disallow: /
  199. User-agent: RepoMonkey
  200. Disallow: /
  201. User-agent: Microsoft URL Control
  202. Disallow: /
  203. User-agent: Openbot
  204. Disallow: /
  205. User-agent: URL Control
  206. Disallow: /
  207. User-agent: Zeus Link Scout
  208. Disallow: /
  209. User-agent: Zeus 32297 Webster Pro V2.9 Win32
  210. Disallow: /
  211. User-agent: Webster Pro
  212. Disallow: /
  213. User-agent: EroCrawler
  214. Disallow: /
  215. User-agent: LinkScan/8.1a Unix
  216. Disallow: /
  217. User-agent: Keyword Density/0.9
  218. Disallow: /
  219. User-agent: Kenjin Spider
  220. Disallow: /
  221. User-agent: Iron33/1.0.2
  222. Disallow: /
  223. User-agent: Bookmark search tool
  224. Disallow: /
  225. User-agent: GetRight/4.2
  226. Disallow: /
  227. User-agent: FairAd Client
  228. Disallow: /
  229. User-agent: Gaisbot
  230. Disallow: /
  231. User-agent: Aqua_Products
  232. Disallow: /
  233. User-agent: Radiation Retriever 1.1
  234. Disallow: /
  235. User-agent: WebmasterWorld Extractor
  236. Disallow: /
  237. User-agent: Flaming AttackBot
  238. Disallow: /
  239. User-agent: Oracle Ultra Search
  240. Disallow: /
  241. User-agent: MSIECrawler
  242. Disallow: /
  243. User-agent: PerMan
  244. Disallow: /
  245. User-agent: searchpreview
  246. Disallow: /
複製代碼
然後保存這個記事本,命名為robots.txt

把這個保存好的文件上傳到你的網站根目錄下,也就是你的網站index.htm的相同目錄下。

大功告成。

就這麽簡單,利用robots.txt設置,可以完全把垃圾機器人和蜘蛛屏蔽掉,他們以後再也不會來到你的網站為非作歹了。


相關閱讀

巧妙使用robots.txt提升網站的權重


作者:網棍@南天 - SEO與網絡營銷
南天SEO - 全面知識,免費分享
文章名稱:通過robots.txt屏蔽國際公認垃圾蜘蛛的訪問,版權所有