• 其他栏目

    李石君

    • 博士生导师
    • 主要任职:武汉大学人工智能研究院数字经济赋能中心主任
    • 其他任职:湖北省公共财政和经济运行大数据工程技术研究中心副主任
    • 性别:男
    • 毕业院校:武汉大学
    • 所在单位:计算机学院
    • 入职时间: 1997-12-07
    • 学科: 计算机应用技术
    • 办公地点:武汉大学人工智能研究院
    • 联系方式:13986190968
    • 电子邮箱:

    访问量:

    开通时间:..

    最后更新时间:..

    Feedback model based deep web crawling strategy

    点击次数:

    所属单位:(1) School of Computer, Wuhan University, Wuhan 430079, China

    发表刊物:Journal of Computational Information Systems

    摘要:The crucial issue of Deep Web Integration is that How to efficiently locate query interfaces of the Deep Web resources. The existing crawlers need to retrieve many off-topic pages in order to get the links' delayed benefit. However, the consideration of the delayed benefit reduces the crawling speed and may make the crawler deviate from the topic. Thus we propose a Deep Web crawling Strategy based on feedback model. In the strategy, we use the ordinal regression model to construct a page classifier to classify the retrieved pages into three levels. And we also need link extractor to extract the three levels' links. During the crawling, we consider the result of the classifier as the feedback which revels whether the links extracted by link extractor satisfy the page classifier. According to the feedback, we extract the features of the links that meet the page classifier. The features can guide the crawler to quickly extract links which satisfy the page classifier. Thus we avoid many off-topic links while remain the links which have delayed benefit. The experimental results indicate that our crawler can automatically extract the promising links' features and avoid many off-topic links, getting an increment of the crawler's speed and accuracy.

    合写作者: Jianwei(1),Tian, Guowen(1), Li, Shijun(1)

    是否译文:

    发表时间:2008-01-01