Wikipedia Question Generation and Question Answering Pipeline.

-- Our QG pipeline takes a Wikipedia article and a specified number of questions as input. We iterate through context chunks in a T5 model based on Question Types.
-- Our QA pipeline first take each question and pass it through a rule-based model that distinguishes its question type between wh- and boolean questions. For yes/no question, we used a T-5 base model tuned on BoolQ to answer the question abstractively. For wh-question, we used a XLM-RoBERTa tuned on Squad QA dataset. We set a hyperparameter, confidence threshold, so that if the output from our extractive model is ambiguous, we instead use an abstractive based T-5 large model to produce the answer.

Crafting Robust Wikipedia Question Generation and Answering System Using Transformer-based Techniques

from NLP Team Project


Our project focuses on generating and answering Wh and Yes/No questions based on Wikipedia articles in real-time.

Our pipeline for question generation (QG) and answering (QA) leverages state-of-the-art transformer-based models. We identified challenges such as generating similar questions and difficulties in answering questions that require long context. Our system incorporates an article pre-processing technique, and employs a combination of extractive and abstractive methods during the post-processing stage. During evaluation, our proposed system is able to generate and answer both "wh-" and boolean questions over long documents in Wikipedia articles. Our approach demonstrates a robust and versatile architecture for question generation and answering, effectively harnessing the capabilities of advanced transformer-based models.

Full Report: link

Comparison of our Question Generation and Baseline Generation:
The baseline model produced several duplicated questions with a moderate article length, with no boolean questions generated. In contrast, our QG pipeline generated a greater variety of questions, both in terms of lexical format and semantic meaning. Our model also excels in longer article and contexts.