Tom's INFSCI 2140 Reading & Muddiest Notes: Reading Note

*A note to our regular readers, we skipped unit 5

First up we have chapter 8 of IIR

Measuring the effectiveness of IR systems

We need a test collection
We need a set of test queries
We need a set of relevance judgments as the book calls them

Test collections for this purpose

Cranfield collection
TREC

Put together by NIST

GOV2

Bigger version of TREC, also done by NIST
Still 2 orders smaller than that indexed by search engines

NTCIR

Focuses on east-asian languages

CLEF

European languages

Reuters

Newswires

20 Newsgroups

Evaluating unranked retrieval results

Precision

fraction of documents retrieved that are relevant

Recall

fraction of relevant documents that are retrieved

F-measure

A single measure that uses both

Evaluating ranked retrieval results

Precision-recall curve

Why not just use F-measure?

Many other ways to evaluate results

Developing reliable and informative test collections

Using pooling of the top k documents and having them judged by experts

User utility & the use of document relevance

Satisfaction of the users is very important

Maybe more so than whether an expert judges something relevant

Results snippets

Just like Google, we should give small snippets of the returned text for each ranked document

Cumulated Gain-Based Evaluation of IR Techniques

This is a paper from 2002 that looks at several techniques for evaluating IR systems or techniques. It talks about recall and precision like Ch. 8 but attempts to go further. The first one uses the relevance scores of the documents in the results. The second, discounts "late-retrieved" documents. The third method looks at the performance of different techniques. They used the TREC-7 data set. This paper would seem to be the basis for our ability to really test different IR systems using different established methods.

What's the value of TREC

Tom's INFSCI 2140 Reading & Muddiest Notes

Friday, February 28, 2014

Reading Note - Unit 6

No comments:

Post a Comment