Friday, February 28, 2014

Reading Note - Unit 6

*A note to our regular readers, we skipped unit 5


First up we have chapter 8 of IIR


  • Measuring the effectiveness of IR systems
    • We need a test collection
    • We need a set of test queries
    • We need a set of relevance judgments as the book calls them
  • Test collections for this purpose
    • Cranfield collection
    • TREC
      • Put together by NIST
    • GOV2
      • Bigger version of TREC, also done by NIST
      • Still 2 orders smaller than that indexed by search engines
    • NTCIR
      • Focuses on east-asian languages
    • CLEF
      • European languages
    • Reuters
      • Newswires
    • 20 Newsgroups
  • Evaluating unranked retrieval results
    • Precision
      • fraction of documents retrieved that are relevant
    • Recall
      • fraction of relevant documents that are retrieved
    • F-measure
      • A single measure that uses both
  • Evaluating ranked retrieval results
    • Precision-recall curve
      • Why not just use F-measure?
    • Many other ways to evaluate results
  • Developing reliable and informative test collections
    • Using pooling of the top k documents and having them judged by experts
  • User utility & the use of document relevance
    • Satisfaction of the users is very important
      • Maybe more so than whether an  expert judges something relevant
  • Results snippets
    • Just like Google, we should give small snippets of the returned text for each ranked document

Cumulated Gain-Based Evaluation of IR Techniques

    This is a paper from 2002 that looks at several techniques for evaluating IR systems or techniques.  It talks about recall and precision like Ch. 8 but attempts to go further.  The first one uses the relevance scores of the documents in the results. The second, discounts "late-retrieved" documents. The third method looks at the performance of different techniques.  They used the TREC-7 data set. This paper would seem to be the basis for our ability to really test different IR systems using different established methods.

What's the value of TREC


No comments:

Post a Comment