AIHUB.ML - Competition

VLSP2021 - Vietnamese Machine Reading Comprehension

Organized by vlsp-mrc-2021-organisers - Current server time: April 19, 2024, 5:42 p.m. UTC

Private Test

Oct. 24, 2021, 5 p.m. UTC

Current

Post Evaluation

Oct. 15, 2021, 5 p.m. UTC

End

Competition Ends

Oct. 27, 2021, 4:59 p.m. UTC

Overview
Evaluation
Dataset
Terms and Submission Guidelines
Reference

Task Description

Machine Reading Comprehension (MRC) has lately emerged as an area in computational linguistics (CL) in which automatic systems are developed to find correct answers to questions posed in human language, given documents containing the answers. The task of Vietnamese Machine Reading Comprehension is the extraction-based machine reading comprehension on Vietnamese Wikipedia-based texts. Based on SQuAD [1, 2], we developed Vietnamese Question Answering Dataset (UIT-ViQuAD), which is a reading comprehension dataset, consisting of questions posed by crowd-workers on a set of Wikipedia Vietnamese articles, where the answer to every question is a span of text, from the corresponding reading passage, or the question might be unanswerable.

UIT-ViQuAD2.0 combines the 23K questions in UIT-ViQuAD 1.0 [3] with over 12K unanswerable questions written adversarially by crowd-workers to look similar to answerable ones. To do well on UIT-ViQuAD 2.0, MRC systems must not only answer questions when possible but also determine when no answer is supported by the context and abstain from answering. In this task, participating teams use UIT-ViQuAD2.0 to evaluate machine reading comprehension models.

UIT-ViQuAD 1.0, the previous version of the UIT-ViQuAD dataset [3], contains 23K+ question-answer pairs on 170+ articles

IMPORTANT: Before submitting on the system, you must rename your the submission file to results.json, and compressed it as zip file with the name: results.zip

NOTE: On the Post Evaluation phase, please choose the button "Make your submission public" (the red button after your submission has finished) to send your submission result to the Public leaderboard

Publication: Please cite this paper if you use this dataset for research purposes:

Kiet Van Nguyen, Son Quoc Tran, Luan Thanh Nguyen, Tin Van Huynh, Son T. Luu, and Ngan Luu-Thuy Nguyen. 2021. VLSP 2021 - ViMRC Challenge: Vietnamese machine reading comprehension. In Proceedings of the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021)

Link to the publication paper: https://arxiv.org/abs/2203.11400

Evaluation Metrics

Following the evaluation metrics on SQuAD2.0 [2], we use EM and F1-score as evaluation metrics for Vietnamese machine reading comprehension:

F1-score: F1-score is a popular metric for natural language processing and is also used in machine reading comprehension. F1-score estimated over the individual tokens in the predicted answer against those in the gold standard answers. The F1-score is based on the number of matched tokens between the predicted and gold standard answers.

Precision=(the number of matched tokens)/(the total number of tokens in the predicted answer)

Recall=(the number of matched tokens)/(the total number of tokens in the gold standard answer)

F1-score=(2*Precision*Recall)/(Precision+Recall)

Exact Match (EM): For each question-answer pair, if the characters of the MRC system's predicted answer exactly match the characters of (one of) the gold standard answer(s), EM = 1, otherwise EM = 0. EM is a stringent all-or-nothing metric, with a score of 0 for being off by a single character. When evaluating against a negative question, if the system predicts any textual span as an answer, it automatically obtains a zero score for that question.

The final ranking is evaluated on the test set, according to the F1-score (EM as a secondary metric when there is a tie). The results are round to the nearest hundredth (3 decimal places). If 2 teams have the same F1 score, EM score is used to determine which team is better.

The task's evaluation script: https://drive.google.com/file/d/1vn6Aed4nacSD932YezQgvWNIOx_1PCb4/view?usp=sharing

Dataset Information

We provide UIT-ViQuAD2.0 consisting of over 35K questions to participating teams. The dataset is stored in .json format. Here are a few question examples extracted from the dataset.

Context: Khác với nhiều ngôn ngữ Ấn-Âu khác, tiếng Anh đã gần như loại bỏ hệ thống biến tố dựa trên cách để thay bằng cấu trúc phân tích. Đại từ nhân xưng duy trì hệ thống cách hoàn chỉnh hơn những lớp từ khác. Tiếng Anh có bảy lớp từ chính: động từ, danh từ, tính từ, trạng từ, hạn định từ (tức mạo từ), giới từ, và liên từ. Có thể tách đại từ khỏi danh từ, và thêm vào thán từ.

question : Tiếng Anh có bao nhiêu loại từ?

is_impossible : False. // There exists an answer to the question.

answer : bảy.

-----------------

question : Ngôn ngữ Ấn-Âu có bao nhiêu loại từ?

is_impossible : True. // There is no correct answer extracted from the Context.

plausible_answer : bảy. // A plausible but incorrect answer extracted from the Context has the same type which the question aims to.

Note: All data are transferred to participating teams via email.

Publication: Please cite this paper if you use this dataset for research purposes:

Link to the publication paper: https://arxiv.org/abs/2203.11400

Terms:

All teams must provide pre-trained embedding and pre-trained language models that you use in this contest before Oct 10, 2021, and do not use any external resources related to machine reading comprehension and question answering for model training except data provided by organizers. If you use pre-trained embedding and pre-trained language models that are not on the list provided by the participating teams or using the external resources related to machine reading comprehension and question answering, the final result is not accepted.
The team's name cannot be "BASELINE", "Baseline", and "baseline" because this name makes confusion between the participants' models with our baseline model.
In the public test phase, the system allows 10 submissions per day. In the private test phase, only 1 submission per day is allowed.
The top 3 teams are required to submit the technical paper to VLSP 2021 to get your achievement acknowledged. If any top teams did not submit their papers, follow-up teams can submit and take their places. The top 3 teams may be required to provide source code to examine the final results.

Submission guidelines:

The submission file is in JSON format, and must be named as: results.json.

The JSON content is structured as:

{

“<id_of_question>” : “answers text”,

…..

}

Here is an example of JSON format for the submission file:

{

“uit_034_35”: “Paris là kinh đô ánh sáng”,

“uit_035_57”: “”,

“uit_037_12”: “Paris là thủ đô Cộng hoà Pháp”,

…...

}

Before submitting on the system, the submission file must be compressed as zip file with the name: results.zip

[1] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. "SQuAD: 100,000+ Questions for Machine Comprehension of Text." Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016.

[2] Pranav Rajpurkar, Robin Jia, and Percy Liang. "Know What You Don’t Know: Unanswerable Questions for SQuAD." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018.

[3] Kiet Van Nguyen, Duc-Vu Nguyen, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen. "A Vietnamese Dataset for Evaluating Machine Reading Comprehension." Proceedings of the 28th International Conference on Computational Linguistics. 2020.

Trial

Start: Sept. 30, 2021, 5 p.m.

Public Test

Start: Oct. 4, 2021, 5 p.m.

Description: Please name your team

Private Test

Start: Oct. 24, 2021, 5 p.m.

Post Evaluation

Start: Oct. 15, 2021, 5 p.m.

Description: Please choose the button "Make your submission public" (the red button after your submission has finished) to send your submission result to the public leaderboard

Competition Ends

Oct. 27, 2021, 4:59 p.m.

You must be logged in to participate in competitions.

Competition

VLSP2021 - Vietnamese Machine Reading Comprehension

Previous

Current

End

Trial

Public Test

Private Test

Post Evaluation

Competition Ends