Tom's INFSCI 2140 Reading & Muddiest Notes: Reading Notes

*A note to our regular readers Oard & Diekema's paper Cross-Language Information Retrieval is not able to be retrieved anywhere.

IES Chapter 14

Dealing with large amounts of data, like Google. This chapter looks at ways of dealing with the massive amount of data

Parallel Query Processing - using index partitioning & replication

Document Partitioning

Each server has a subset of the documents

Term Partitioning

Each server has a subset of the index in memory

Hybrid Schemes

Use term & document partitioning
OR use document partitioning and replication

Redundancy & Fault Tolerance

We must assume that with more queries and machines there will be faults
We can handle this with replication and partial replication

MapReduce

The Basic Framework

Highly parallelizable by executing map and reduce at the same time.

Combiners

A reduce function applied to a map shard and not a reduce shard

Secondary Keys

A function of MapReduce that deals with duplicate keys

Machine Failures

the map side deals with failure better than the reduce side

He & Wang - Cross-Language Information Retrieval

This excerpt from a book looks at the challenges of taking queries in one language and returning relevant information that is in another. Currently no search engine supports this, even Google. First, the system must decide on how it will translate the given query. Currently you can use all the usual techniques such as stemming tokenization, phrase identification, stop-word removal, n-gram etc. Then the application must have translation knowledge. This can be done with bilingual dictionaries or corpora. But, the system must be able to deal with acronyms and proper nouns. The chapter then goes into how to take this knowledge to find document that best suits the user using term weighting. Then the system must have some method for being evaluated. There are few evaluation frameworks available such as the CLIR TREC track.

Tom's INFSCI 2140 Reading & Muddiest Notes

Thursday, March 27, 2014

Reading Notes - Unit 11

No comments:

Post a Comment