On March 16, 2023 at 11:00 EET, Eduard C. Drăguț (Temple University) will give a talk in the new Data Science Seminar.
Title: Continuous, Gradual Entity Mining from Web Data Streams
Abstract: Named Entity Recognition (NER) is a key component in many intelligent systems like knowledge graphs, question answering, information retrieval, and early prediction of emerging events. NER systems have been studied and developed for decades, nevertheless NER is a continuous, neverending learning process because language and its usage evolves over time. For example, the emergence of social media with colloquial user content exposed the previous state-of-the-art NER that expected long documents written in formal language. In this talk, I present our work on entity mining from microblog streams, where we advocate for continuous, gradual entity mining with revisits. It needs to be continuous because the system stays with a topic for its duration in a social media stream. It is gradual because the system begins with easy instances, which can be labeled with high accuracy, and then it gradually labels more challenging instances. The system revisits difficult instances that were encountered ahead of easy instances in a stream. If these three conditions are met than (near) real-time NER can be achieved over microblogs. I will also introduce our work on recognizing entities that follow or closely resemble a regular expression (regex) pattern, their applications to other (unexpected) domains, and how we use it to seed our work on human-in-the-loop mining.
The talk will take place physically at FMI (Academiei 14), Hall 214 “Google”.