Money Tips
HomeAIAdvanced Q&A Features with DistilBERT

Advanced Q&A Features with DistilBERT

Published on

spot_img


Question Answering (Q&A) is one of the signature practical applications of natural language processing. In a previous post, you have seen how to use DistilBERT for question answering by building a pipeline using the transformers library. In this post, you will deep dive into the technical details to see how you can manipulate the question for your own purpose. Specifically, you will learn:

  • How to use the model to answer questions from a context
  • How to interpret the model’s output
  • How to build your own question-answering algorithm by leveraging the model’s output

Let’s get started.

Advanced Q&A Features with DistilBERT
Photo by Marcin Nowak. Some rights reserved.

Overview

This post is divided into three parts; they are:

  • Using DistilBERT Model for Question Answering
  • Evaluating the Answer
  • Other Techniques for Improving the Q&A Capability

Using DistilBERT Model for Question Answering

BERT (Bidirectional Encoder Representations from Transformers) was trained to be a general-purpose language model that can understand text. DistilBERT is a distilled version, meaning it is architecturally similar but smaller than BERT. It is 40% smaller in size and runs 60% faster, while its language understanding capabilities are 97% of those of BERT. Therefore, it is a good model for production use to get higher throughput.

There is a pre-trained model of DistilBERT in the Hugging Face model hub. It needs to be used with a specific tokenizer. In the transformers library, the DistilBERT tokenizer and the Q&A model are respectively DistilBertTokenizer and DistilBertForQuestionAnswering. You can load the pre-trained model and use it by following the code below:

You created the tokenizer and the model using the from_pretrained() method. This will download the model from the model hub and create the objects. You defined the question and the context as Python strings. But since the model as a neural network accepts numerical tensors, you need to use a tokenizer to convert the strings into integer “tokens”, which can be understood by the model.

You pass on both the question and the context to the tokenizer. This usage is only one of the many ways the tokenizer can be used but it is what the Q&A model expects. The tokenizer will understand the inputs as:

and [CLS] and [SEP] are special tokens that are used to indicate the beginning and end of the subsequence.

This output is then passed on to the model, which returns an output object with attributes start_logits and end_logits. These are logits (log-probabilities) for where the answer is located in the context. Hence we can extract the subsequence from the sequence of input tokens, convert it back to text, and report that as the answer.

Evaluating the Answer

Recall that the model produces logits (i.e., log probabilities) for the start and end positions of the answer in the context. The way that we extract the answer in the example above is simply to take the most probable start and end positions using the argmax() function. Not that you should not interpret the floating point values produced by the model as probabilities. In order to get the probability, you need to convert the logits using the softmax function.

An ideal model should produce the probability as binary, i.e., only one element is probability 1, and the rest are all probability 0. In practice, this is not the case. Instead, the model will produce a probability distribution with drastic contrast between one element and the rest if the model is confident about the answer, but almost a uniform distribution if the model is not confident.

Therefore, we can further interpret the logit output as the confidence score of the answer as produced by the model. Below is an example of why this is useful:

In the example above, instead of one question and one context, we provided a list of contexts and a single question. Each context is used to get an answer and the confidence score. This is implemented by the function get_answer(). The score is a simple sum of the model’s predicted value for the start and end positions, assuming that the model will produce a greater value for a more confident answer. Finally, you find the one with the highest confidence score and report that as the answer.

This approach allows us to search for answers across multiple documents and return the most confident answer. However, it’s worth noting that this is a simplified approach. In a production system, you might want to use more sophisticated methods for ranking answers, such as considering the length of the answer and the position in the document or using a separate ranking model.

Other Techniques for Improving the Q&A Capability

You can easily extend the code above for a more sophisticated Q&A system, such as one that supports caching the result or processing multiple questions in a batch.

One limitation of the model used in the example above is that you cannot feed a very long context to the model. This model has a maximum sequence length of 512 tokens. If your context is longer than this, you’ll need to split it into smaller chunks.

You can create chunks naively by splitting the context at every 512 tokens, but you risk breaking the answer in the middle of a sentence. Another approach is to use a sliding window. Below is an implementation:

This code implements the function get_answer_sliding_window() that will split the context into shorter pieces if it is too long. Each split to maintain the maximum total number of tokens of the question and the context.

The split is done by a sliding window, and each split will move the window the size of stride, which is defaulted to 128. In other words, each subsequent window will discard 128 tokens from the left and add 128 tokens from the right. Given the total length of 512, there should be significant overlap between the windows so that the answer should not be fragmented in at least one of the windows.

Answer if found using the model with context from each window. The best answer is reported based on the confidence score. This way, the context can be of arbitrary length, thanks to the availability of the tokenizer as an independent object that you can use to encode and decode text.

Another way to improve the Q&A capability is to use not one but multiple models. This is the idea of ensemble methods. One simple approach is to run the question twice, each from a different model, and pick the answer with the highest score. An example is shown below, which uses the original BERT model as the second model:

This code instantiated two tokenizers and two models; one is DistilBERT, and the other is BERT. The functions get_distilbert_answer() and get_bert_answer() are used to get the answer from the respective model. Both functions are invoked for the provided question and context. The final answer is the one with the highest confidence score.

Ensemble methods can improve accuracy by leveraging the strengths of different models and mitigating their individual weaknesses. The above is just one way to combine models. There are other approaches to use with ensemble methods. For example, with more models, you can use voting to choose the most frequently occurring answer. You can also assign weights to each model and take the weighted average of their output, from which the answer is derived. A more complicated approach is stacking, where you train a meta-model to combine the predictions of the base models. This generalized the weighted average approach but increased the computational complexity.

Further Readings

Below are some further readings that you may find useful:

Summary

In this post, you have explored how to use DistilBERT for advanced question-answering tasks. In particular, you have learned:

  • How to use DistilBERT’s tokenizer and the Q&A model directly
  • How to interpret the Q&A model’s output and extract the answer
  • How you can make use of the model’s raw output to build a more sophisticated Q&A system



Source link

Latest articles

How Healthcare Providers Can Put Their Analytics to Work

Today, almost every healthcare institution...

The Favorite Slot Online Site Indonesian Players

Jandaslot In the vibrant landscape of online gambling, Jandaslot has quickly become a...

PoSciDonDAO Approves First Research Funding Proposal from Rare Compute

PoSciDonDAO, a decentralized organization dedicated to advancing scientific research, has approved its first-ever...

OpenAI debuts Codex CLI, an open source coding tool for terminals

In a bid to inject AI into more of the programming process, OpenAI...

More like this

How Healthcare Providers Can Put Their Analytics to Work

Today, almost every healthcare institution...

The Favorite Slot Online Site Indonesian Players

Jandaslot In the vibrant landscape of online gambling, Jandaslot has quickly become a...

PoSciDonDAO Approves First Research Funding Proposal from Rare Compute

PoSciDonDAO, a decentralized organization dedicated to advancing scientific research, has approved its first-ever...