Skip to content
View phucdev's full-sized avatar

Highlights

  • Pro

Organizations

@DFKI-NLP

Block or report phucdev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
phucdev/README.md

Hi there, I'm Phuc 👋

I'm a master's student in Computer Science at Humboldt University of Berlin, currently awaiting the results of my thesis titled "Cross-lingual Transfer of Pre-Trained Language Models to Vietnamese".

For the past 6 years, I have been working as a student assistant in the Speech and Language Technology Group at the German Research Center for Artificial Intelligence (DFKI), focusing on Natural Language Processing (NLP) – especially Information Extraction. During my time at the DFKI I wrote my bachelor thesis on weak supervision for event extraction. Most recently I have been working on Biomedical Relation Extraction with Large Language Models (LLMs) using the LangChain framework.


📘 My Master’s Thesis

I recently submitted my Master’s thesis titled "Cross-lingual transfer of Pre-Trained Language Models to Vietnamese".
In this work I explored cross-lingual transfer techniques for adapting pre-trained language models to Vietnamese, focusing on tokenizer replacement and efficient initialization strategies.

👉 Check out the code and experiments here


🎓 My Bachelor’s Thesis & Projects

For my Bachelor’s thesis titled "Investigating Weak Supervision for the Extraction of Mobility Relations and Events in German Text", I explored weak supervision techniques for event extraction. In particular, I worked with Snorkel, a framework for programmatically generating training data through labeling functions based on heuristics.
This work was conducted at the Speech and Language Technology Lab, where I contributed to ongoing research efforts and related open-source projects. These include::

  • 🔧 eventx: Implementation of joint classification of events and arguments
  • 🔍 wsee: Codebase developed as part of my thesis to investigate weak supervision for extracting mobility-related events and relations
  • 📊 MobIE Dataset: A dataset of mobility-related named entity and n-ary relation annotations
  • 📄 Published paper @ KONVENS 2021: Hennig, L., Truong, P. T., & Gabryszak, A. (2021). MobIE: A German Dataset for Named Entity Recognition, Entity Linking and Relation Extraction in the Mobility Domain. In Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021) (pp. 223–227). Düsseldorf, Germany.

Connect with me:

LinkedIn

Follow me on Hugging Face

🛠 Languages and Tools

Python PyTorch Hugging Face LangChain Java Pandas Docker GitHub ChatGPT



Pinned Loading

  1. TL_Bio_RE TL_Bio_RE Public

    Transfer Learning for Biomedical Relation Extraction Seminar. Applying BioBERT & SciBERT to Relation Extraction (protein-protein-interaction).

    Python 7 1

  2. FBMTC FBMTC Public

    Flair for Biomedical Text Classification

    Python