At the UMC Utrecht, large amounts of unstructured text are created in electronic healthcare record (EHR) systems every day. While there are efforts to improve how healthcare professionals structure data at the time of entering, there is also a need to use unstructured data that is already in the EHR systems in a more efficient way. Unstructured text contains enormous amounts of information that are not captured as structured data. Enabling the analysis of unstructured text data allows for new ways to do research on patient data, to make hospital processes more efficient and to improve patient care.
The Data Solutions & Research IT team of the UMC Utrecht is starting a multi-phase project to unlock the potential of this type of data. In the first phase we will build a data-processing pipeline that extracts data from our current systems and makes it accessible in a fast search application. This process will include an entity linking component to detect medical concepts in text, such as names of diseases, symptoms and medications. This will be challenging because of the nature of unstructured text, which often contains acronyms, typos, spelling mistakes, negations and probabilities. We will attempt to solve this by using natural language processing (NLP) methods to capture the context in which concepts are used and distinguish the different ways concepts can be used.
- How can NLP methods be used to extract Dutch medical terms from unstructured text in patient records?
- Can ambiguous medical terms be distinguished based on the context?
- Is it possible to identify whether a term is about a family member, is negated or is historical?
Data Solutions & Research IT is a unit within the UMC Utrecht focussed on data, analytics and research support. On the data side it works with multiple agile teams on data integration, data-warehousing, data analytics and data management. On the research side, the agile teams work on research infrastructure, tooling and policy.
- Enrollment in data science, artificial intelligence, medical informatics, bioinformatics, computer science or comparable MSc programme
- Experience with programming in Python and command line interfaces
- An interest in working with healthcare data
- Most data will be Dutch medical text, therefore being fluent in Dutch is a necessity
Nice to have:
- Experience with building data-processing pipelines and handling large data sets.
- Experience with natural language processing (NLP)
- Experience with software development tools, such as Docker, Git and CI/CD
- Familiar with handling databases, such as SQL and ElasticSearch
- Knowledge of medical terminologies and data standards.
A standard internship fee of EUR 300 (before taxes) applies.
The salary for this 18 - 40 function is still to be determined. It is an internship.
In addition, we offer an annual benefit of 8.3%, holiday allowance, travel expenses and career opportunities. The terms of employment are in accordance with the Cao University Medical Centers (UMC).
If you have any questions about this vacancy, please contact Mrs.Sven Oosterhoff, Teamleider Data Solutions & Research IT, phone number: +31 88 755 55 55, e‑mail adress: firstname.lastname@example.org.
Acquisition based on this jobopening is not appreciated.