武汉大学李石君--中文主页

李石君

博士生导师
主要任职：武汉大学人工智能研究院数字经济赋能中心主任
其他任职：湖北省公共财政和经济运行大数据工程技术研究中心副主任
性别：男
毕业院校：武汉大学
所在单位：计算机学院
入职时间： 1998-07-01
学科：计算机应用技术
办公地点：武汉大学人工智能研究院
联系方式：13986190968
电子邮箱：

访问量：

开通时间：..

最后更新时间：..

同专业博导

当前位置: 中文主页 >> 科学研究 >> 论文成果

Feedback model based deep web crawling strategy

点击次数：

所属单位：(1) School of Computer, Wuhan University, Wuhan 430079, China

发表刊物：Journal of Computational Information Systems

摘要：The crucial issue of Deep Web Integration is that How to efficiently locate query interfaces of the Deep Web resources. The existing crawlers need to retrieve many off-topic pages in order to get the links' delayed benefit. However, the consideration of the delayed benefit reduces the crawling speed and may make the crawler deviate from the topic. Thus we propose a Deep Web crawling Strategy based on feedback model. In the strategy, we use the ordinal regression model to construct a page classifier to classify the retrieved pages into three levels. And we also need link extractor to extract the three levels' links. During the crawling, we consider the result of the classifier as the feedback which revels whether the links extracted by link extractor satisfy the page classifier. According to the feedback, we extract the features of the links that meet the page classifier. The features can guide the crawler to quickly extract links which satisfy the page classifier. Thus we avoid many off-topic links while remain the links which have delayed benefit. The experimental results indicate that our crawler can automatically extract the promising links' features and avoid many off-topic links, getting an increment of the crawler's speed and accuracy.

合写作者： Jianwei(1),Tian, Guowen(1), Li, Shijun(1)

是否译文：否

发表时间：2008-01-01

上一条：Mining association concept based on formal concept analysis

下一条：基于粗糙近似的Web事务聚类改进算法