Resume parsing, also known as CV parsing, resume extraction, or CV extraction, allows for the automated storage and analysis of resume data. The resume is imported into parsing software and the information is extracted so that it can be sorted and searched.
Resume parsers analyze a resume, extract the desired information, and insert the information into a database with a unique entry for each candidate. Once the resume has been analyzed, a recruiter can search the database for keywords and phrases and get a list of relevant candidates. Many parsers support semantic search, which adds context to the search terms and tries to understand intent in order to make the results more reliable and comprehensive.
Machine learning is extremely important for resume parsing. Each block of information needs to be given a label and sorted into the correct category, whether that's education, work history, or contact information. Rule-based parsers use a predefined set of rules to parse the text. This method does not work for resumes because the parser needs to "understand the context in which words occur and the relationship between them." For example, if the word "Harvey" appears on a resume, it could be the name of an applicant, refer to the college Harvey Mudd, or reference the company Harvey & Company LLC. The abbreviation MD could mean "Medical Doctor" or "Maryland". A rule-based parser would require incredibly complex rules to account for all the ambiguity and would provide limited coverage.
This leads us to Machine Learning and specifically Natural Language Processing (NLP). NLP is a branch of Artificial Intelligence and it uses Machine Learning to understand content and context as well as make predictions. Many of the features of NLP are extremely important in resume parsing. Acronym normalization and tagging accounts for the different possible formats of acronyms and normalizes them. Lemmatization reduces words to their root using a language dictionary and Stemming removes “s”, “ing”, etc.