Paper Review #2 - Toward creating a fairer ranking in search engine results

by. Jongwon Lee | 71 Views (53 Uniq Views) | about 2 years ago

#PaperReview #InformationRetrieval #MachineLearning

Paper Review of Toward creating a fairer ranking in search engine results (Ruoyuan Gao and Chirag Shah, 2020)

1. Background
Gao et al. argue that the presentation of search engine results affects users’ judgment, selection, and belief. Also, fairly ranked search results should include exposure of under-represented groups or items. Following these ideas diversifies the topical coverage and improves user awareness of different opinions. However, they state that reality is often far from this ideal. Moreover, according to Gao et al., a ranking algorithm should not sacrifice relevance to improve fairness, but that may not always be the case. Therefore, the trade-off between relevance and fairness is one problem that needs to be resolved regarding fairness in IR.

Gao et al. bring up a scenario where a user types in the query “coffee health” on Google Search. Ideally, Google should present both the benefits and harms of coffee to cover the topic "health". However, all of the featured snippets are about the benefits of coffee. This will build an impression on the user that coffee is beneficial to health at the first glance. All the snippets from articles on the first page are suggesting on coffee’s benefits. If the user only looks at the first page results, the user may reach the conclusion that coffee is good for health. However, some articles on the second page include the negative side of coffee.

From the perspective of diversity and novelty when defining fairness, the benefits and harms of coffee are two aspects of the topic. Search results containing both sides can be considered topical diversity. A diversified set of results should contain as many aspects as possible. Thus, the first page results should ideally offer both sides of coffee. Moreover, this paper argues that top results should contain as few redundancies as possible so that the user does not have to investigate a vast amount of items to find the information they needed or explore various aspects related to the query. This idea is related to improving novelty.

2. Research Questions

How do we quantitatively measure the degree of topical bias?
How do various fairness ranking methodologies perform on search engine results regarding relevance?
What is the impact of the amount of diversity fairness introduced on relevance, diversity, and novelty?
What are the trade-offs between diversity, novelty, and relevance with respect to different fairness strategies?

3. Data and methods
Dataset

Google Search Dataset: Gao et al. crawled data from Google Search to understand bias and fairness in search engines. They took 100 queries from Google Trends and crawled available search results. The dataset resulted in 7,410 documents totaling 7.65M words, with an average of 74 documents per query, and an average vocabulary size of 8617 per query.

Methods

Fairness Re-ranking Strategies

Top-top: A naive approach to simply pick the top-ranked results from each group, then rank the documents in the same order of their original rank combined.
Page-wise: Picking just one document from each cluster on each page, until the number of documents from each cluster satisfies the fairness constraint F.
Fair-random: Instead of always picking from the top, we pick uniformly at random from each cluster, then order the picked documents according to their original rank.

Epsilon Greedy - balances exploration and exploitation by choosing between exploration and exploitation randomly

epsilon-greedy: a simple strategy that is commonly used in multi-armed bandit problems: with probability 1, select the lever that historically yields the highest reward
Naive epsilon-greedy: does not consider group assignments, the naive epsilon-greedy algorithm simply explores the entire document collection with probability epsilon
Fair epsilon-greedy: considers group assignments, the highest rewarding cluster at each moment is the one that pushes the results to satisfy fairness constraints in the current state.

4. Findings
Gao et al. define the fairness of IR, explores cases of bias that occur in IR systems, and propose metrics to measure the fairness of IR. This paper focuses on the top-k diversity fairness ranking regarding statistical parity fairness and disparate impact fairness. They explored the topical diversity bias presented in IR. Moreover, fairness re-ranking strategies were introduced to study the trade-off between fairness ranking and the relevance score.

Experiments in this paper show that fairness can greatly benefit relevance and diversity. Algorithms such as top-top can satisfy the fairness requirement and also maintain good relevance. Investigation of the dataset of 100 queries and top 100 results per query from Google demonstrates how topical diversity bias is present in the top web search results. This paper compares the performance of different re-ranking strategies on Google search data and on TREC data. Gao et al. discovered that depending on the constraint of fairness, there is no one-for-all strategy that suits every situation. The performance of a strategy and the trade-offs between different factors may differ in various scenarios and the trait of the dataset (topics, ranking algorithms, quality of relevance). This paper proposed entropy-based metrics to measure the degree of bias under different fairness constraints and queries and effectively evaluate different fairness ranking algorithms. Gao et al. introduce two perspectives related to fairness: diversity and novelty. Diversity focuses on the coverage of multiple subtopics regarding a search query. Novelty reduces the redundant information in each subsequent result. Relevance is a perspective to measure the utility of the IR. Fair search engines should be diverse, novel, and relevant. Moreover, their studies reveal that fairness ranking strategies that aim to increase diversity and novelty can still yield good fairness in top search results without loss of relevance. Authors show that diversity and relevance are highly correlated with statistical parity fairness. Also, the lower the diversity bias, the higher the relevance, diversity, and novelty. Experiments in this paper reveal that the fair epsilon-greedy strategy enhances fairness and diversity in search results without reducing relevance.