What can pictures tell us about web pages? Improving document search using images

dc.contributor.authorRodriguez-Vaamonde, Sergio
dc.contributor.authorTorresani, Lorenzo
dc.contributor.authorFitzgibbon, Andrew W.
dc.contributor.institutionTecnalia Research & Innovation
dc.date.accessioned2024-07-24T12:15:08Z
dc.date.available2024-07-24T12:15:08Z
dc.date.issued2015-06-01
dc.descriptionPublisher Copyright: © 2014 IEEE.
dc.description.abstractTraditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.en
dc.description.statusPeer reviewed
dc.format.extent12
dc.identifier.citationRodriguez-Vaamonde , S , Torresani , L & Fitzgibbon , A W 2015 , ' What can pictures tell us about web pages? Improving document search using images ' , IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 37 , no. 6 , 6945905 , pp. 1274-1285 . https://doi.org/10.1109/TPAMI.2014.2366761
dc.identifier.doi10.1109/TPAMI.2014.2366761
dc.identifier.issn0162-8828
dc.identifier.urihttps://hdl.handle.net/11556/4544
dc.identifier.urlhttp://www.scopus.com/inward/record.url?scp=84929206960&partnerID=8YFLogxK
dc.language.isoeng
dc.relation.ispartofIEEE Transactions on Pattern Analysis and Machine Intelligence
dc.relation.projectIDEusko Jaurlaritza, IE11-316
dc.relation.projectIDMicrosoft Research
dc.relation.projectIDNational Science Foundation, CNS-1205521-IIS-0952943
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subject.keywordsdocument ranking
dc.subject.keywordsmultimedia search
dc.subject.keywordssearch engines
dc.subject.keywordsWeb Pages
dc.subject.keywordsSoftware
dc.subject.keywordsComputer Vision and Pattern Recognition
dc.subject.keywordsComputational Theory and Mathematics
dc.subject.keywordsArtificial Intelligence
dc.subject.keywordsApplied Mathematics
dc.titleWhat can pictures tell us about web pages? Improving document search using imagesen
dc.typejournal article
Files