Money Tips
HomeAIHow to Do Named Entity Recognition (NER) with a BERT Model

How to Do Named Entity Recognition (NER) with a BERT Model

Published on

spot_img


Named Entity Recognition (NER) is one of the fundamental building blocks of natural language understanding. When humans read text, we naturally identify and categorize named entities based on context and world knowledge. For instance, in the sentence “Microsoft’s CEO Satya Nadella spoke at a conference in Seattle,” we effortlessly recognize the organizational, personal, and geographical references. However, teaching machines to replicate this seemingly intuitive human capability presents several challenges. Fortunately, this problem can be addressed effectively using a pretrained machine learning model.

In this post, you will learn how to solve the NER problem with a BERT model using just a few lines of Python code.

Let’s get started.

How to Do Named Entity Recognition (NER) with a BERT Model
Picture by Jon Tyson. Some rights reserved.

Overview

This post is in six parts; they are:

  • The Complexity of NER Systems
  • The Evolution of NER Technology
  • BERT’s Revolutionary Approach to NER
  • Using DistilBERT with Hugging Face’s Pipeline
  • Using DistilBERT Explicitly with AutoModelForTokenClassification
  • Best Practices for NER Implementation

The Complexity of NER Systems

The challenge of Named Entity Recognition extends far beyond simple pattern matching or dictionary lookups. Several key factors contribute to its complexity.

One of the most significant challenges is context dependency—understanding how words change meaning based on surrounding text. The same word can represent different entity types depending on its context. Consider these examples:

  • Apple announced new products.” (Apple is an organization.)
  • I ate an apple for lunch.” (Apple is a common noun, not a named entity.)
  • Apple Street is closed.” (Apple is a location.)

Named entities often consist of multiple words, making boundary detection another challenge. Entity names can be complex, such as:

  • Corporate entities: “Bank of America Corporation”
  • Product names: “iPhone 14 Pro Max”
  • Person names: “Martin Luther King Jr.”

Additionally, language is dynamic and continuously evolving. Instead of memorizing what qualifies as an entity, models must deduce it from context. Language evolution introduces new entities, such as emerging companies, new products, and newly coined terms.

Now, let’s explore how state-of-the-art NER models address these challenges.

The Evolution of NER Technology

The evolution of NER technology reflects the broader advancement of natural language processing. Early approaches relied on rule-based systems and pattern matching—defining grammatical patterns, identifying capitalization, and using contextual markers (e.g., “the” before a proper noun). However, these rules were often numerous, inconsistent, and difficult to scale.

To improve accuracy, researchers introduced statistical approaches, leveraging probability-based models such as Hidden Markov Models (HMMs) and Conditional Random Fields (CRFs) to identify named entities.

With the rise of deep learning, neural networks became the preferred method for NER. Initially, bidirectional LSTM networks showed promise. However, the introduction of attention mechanisms and transformer-based models proved to be even more effective.

BERT’s Revolutionary Approach to NER

BERT (Bidirectional Encoder Representations from Transformers) has fundamentally transformed NER with several key innovations:

Contextual Understanding

Unlike traditional models that process text in one direction, BERT’s bidirectional nature allows it to consider both preceding and following text. This enables it to capture long-range dependencies, understand subtle contextual nuances, and handle ambiguous cases more effectively.

Tokenization and Subword Units

While not exclusive to BERT, its subword tokenization strategy allows it to handle unknown words while preserving morphological information. This reduces vocabulary size and makes the model adaptable across different languages and domains.

The IOB Tagging Mechanism

NER results can be represented in various ways, but BERT uses the Inside-Outside-Beginning (IOB) tagging scheme:

  • B marks the beginning of an entity.
  • I indicates the continuation of an entity.
  • O signifies non-entities.

This method enables BERT to handle multi-word entities, nested entities, and overlapping entities effectively.

Using DistilBERT with Hugging Face’s Pipeline

The easiest way to perform NER is by using Hugging Face’s pipeline API, which abstracts away much of the complexity while still delivering powerful results. Here’s an example:

Let’s break down to understand this code in detail. First is to initialize the pipeline:

The pipeline() function creates a ready-to-use NER pipeline. It is needed because BERT is a machine learning model but you need to preprocess the text before the model can consume and you need to convert the model’s output into a usable data structure. A pipeline connects these steps.

The argument "ner" specifies you want Named Entity Recognition and model="dbmdz/bert-large-cased-finetuned-conll03-english" loads a pre-trained model specifically fine-tuned for NER. The final argument aggregation_strategy="simple" tells the pipeline to merge subwords into complete words. This makes the output more readable.

The pipeline above returns a list of dictionaries, where each dictionary contains:

  • word: The detected entity text
  • entity_group: The type of entity (PER for person, ORG for organization, etc.)
  • score: Confidence score between 0 and 1
  • start and end: Character positions in the original text

This code will output something like:

Using DistilBERT Explicitly with AutoModelForTokenClassification

For greater control over NER, you can bypass the pipeline API and interact directly with the model and tokenizer:

This implementation is longer. Let’s see the how it works in steps. First is to load the model and tokenizer:

AutoTokenizer automatically selects the appropriate tokenizer for our model based on the model card. Usually a model should use a specific tokenizer only. A tokenizer is an algorithm to transform and split the input text string. AutoModelForTokenClassification loads the model specific to token classification tasks. The model created includes both the architecture and the pretrained weights for NER.

Then let the tokenizer to process the input text:

This converts text into tokens that the model can understand. A token is usually a word but can also be a subword — such as “sub-” and “-word” are recognized separately even it is presented as one word in the text. The output of a tokenizer is a sequence of integers, which each integer corresponds to the token in the tokenizer’s dictionary. The argument return_tensors="pt" returns the sequence as PyTorch tensors. The argument add_special_tokens=True adds [CLS] and [SEP] tokens to the beginning and the end of the output, as required by BERT.

Next is to run the model with the input tensor:

The context torch.no_grad() disables gradient calculation for inference. This saves time and memory in using the model. Calling model(**inputs) runs the forward pass and torch.argmax(outputs.logits, dim=2) transforms the output tensor into the most likely label for each token. The tensor predictions is a tensor of integers.

To read the output, we need to convert the integer output into labels. But let’s prepare the data structure for the conversion:

Dictionary model.config.id2label is a mapping of prediction indices to actual entity labels. The function convert_ids_to_tokens converts integer token IDs back to readable text. Since you run the model with a single line of input text, only a sequence of output is expected. We convert the predictions to a Python list for easier processing.

The reconstruction of entity prediction output uses a for-loop. In BERT’s tokenizer, a subword is prefixed with "##", hence you can easily identify them and merge the subwords. The nature of the current entity is determined from the prediction and presented as a label using the dictionary label_list. These helps to present the result in a human-readable format.

Best Practices for NER Implementation

Doing NER is as simple as above. However, you are not required to use exactly the code above for NER. In particular, you can switch between different models (and also the corresponding tokenizer). If you need the model to run fast, you should pick a DistilBERT model. If you require the result to be accurate, a larger BERT or RoBERTa models should be chosen. You may also find a domain-adapted models if your input requires domain knowledge.

Moreover, if you need to process NER for a lot of input, you can do it faster by processing them in batch. There are also some other techniques to speed up the process, such as using GPU for acceleration or caching the result for frequently accessed texts.

In a production system, some error handling logic should be implemented as well. Such as validating the input, handling edge cases such as empty strings and special characters, and others.

Here’s a complete example incorporating these best practices:

Summary

Named Entity Recognition with BERT models provides a powerful way to extract structured information from text. The Hugging Face Transformers library makes it easy to implement NER with state-of-the-art models, whether you need a simple pipeline approach or more detailed control over the process.

In this tutorial, you learned about NER with BERT. In particular, you learned how to:

  • Use the pipeline API for quick prototypes and simple applications
  • Use explicit model handling for more control and custom processing
  • Consider performance optimization for production applications
  • Always handle edge cases and implement proper error handling

With these tools and techniques, you can build robust NER systems for various applications, from information extraction to document processing and more.

 



Source link

Latest articles

How MetaMask Simplifies Your Entrance into DeFi Liquidity Pools

Summary MetaMask is a bridge that connects you to the DeFi world and allows...

How To Build A High-Converting Accounting Firm Website

Having an accounting firm website is a great start, but it’s not enough....

Revolutionary Graphene Flash Memory Achieves 400 Picosecond Writes

Revolutionary Graphene Flash Memory Achieves 400 Picosecond WritesRevolutionary Graphene Flash Memory Achieves 400...

More like this

How MetaMask Simplifies Your Entrance into DeFi Liquidity Pools

Summary MetaMask is a bridge that connects you to the DeFi world and allows...

How To Build A High-Converting Accounting Firm Website

Having an accounting firm website is a great start, but it’s not enough....