Build an NER Model for Molecular Biology Terms

Post Details

Company

Predibase

Date Published

May 12, 2023

Author

Connor McCormick

Word Count

2,048

Language

English

Hacker News Points

-

Source URL

predibase.com/blog/build-a-named-entity-recognition-model-for-molecular-biology-terminology

Summary

A tutorial guides users through building a Named Entity Recognition (NER) model for molecular biology text using Predibase, a low-code declarative machine learning platform. NER, an essential NLP task, identifies and categorizes entities like proteins or genes within text, crucial for organizing molecular biology data. The tutorial uses the BioNLP/JNLPBA dataset to train the model, starting with data preparation and preprocessing into a format suitable for Predibase. Users create a model repository and configure the model using Predibase's interface, leveraging the Ludwig framework to simplify the process. Initial training on 10% of the dataset achieves 87% accuracy, prompting iterative improvements by adjusting parameters like sample ratio and encoder settings, ultimately reaching 92% accuracy. The tutorial demonstrates three methods to operationalize the model, including deploying via Predibase's managed endpoint, exporting the model, or using the Predictive Query Language (PQL) for batch inference. Predibase significantly reduces the complexity and time required to develop NER models, making advanced tasks more accessible.