Word sense disambiguation

Two parts of this project are required. 1. One-page long report to explain the documentation and answer the questions below. 2. Java documentation

First, make sure you understand: What is the problem of Word Sense Disambiguation (WSD)? Why is it important from a language technology engineering point of view? How is it a classification problem? How do you use SVM learning and classification to solve classification problems generally and in this WSD case? (No detailed understanding of the internal mechanisms of SVM:s and SVM learning algorithms is required.) What is an n-fold cross validation? How do you compute the precision, recall and F1 scores? And what do they show?
The course literature, the links given here, and other net resources give plenty of useful information about the matters important to this assignment.
A Java implementation for WSD experiments provides the point of departure for the assignment [package wsd (tar), doc].
The sense keys are from WordNet and you can see how they are explained there. Classification applies to tokens belonging to a certain lemma, defined by the lemma and pos attributes, and is binary, predicting that a token has a certain sense (defined by a WordNet sense key) or that it doesn’t have that sense (has another sense).
You (and the Java implementation) will use the following resources (placed as specified in theFileLocations class fields, which you’re expected to modify according to your own preferences):
 Data: The largest publicly available sense-tagged corpus: semcor3.0. (Download semcor3.0.) Put the semcor3.0 contents in the FileLocations.semCorLoc directory. (This is already done for our own linux system.)
 We’ll use Thorsten Joachims’ SVMlight implementation of SVM learning (svm_learn) and classification (svm_classify). (Other SVM implementations can be used, of course.) Put the two programmes in the FileLocations.progLoc directory. (This is already done for our own linux system.)
 We’ll use the classification tasks in the following text file for our experiments:senseDistinctions.txt. It lists those senses which have at least 100 positive instances, and which applies to between 40 and 60 percent of all the instances of that lemma. The 32 entries are like “add VB 2:30:00::”, i.e. lemma, pos tag, and sense key separated by blanks. PutsenseDistinctions.txt in the FileLocations.wsdData directory.
 The FileLocations.wsdData directory should have the subdirectories examples, model, and predictions, in which input and output from SVMlight will reside.
The code is compiled and executed in this way on our Linux system when you stand in the srcdirectory (but you might want to use som other environment):

To describe the experiment briefly, it performs WSD SVM training and classification on each of the sense distinctions in “senseDistinctions.txt”. First, the example data are extracted from the SemCor corpus, i.e. positive and negative instances are located and features extracted. After that, training and validation sets are created for a 10-fold crossvalidation. They are used for SVM training and classification.
The Main.main method runs the experiment and reports precision, recall and F1 score for each sense distinction crossvalidation experiment and also averages of these outcome scores. This is an example of this: output.
The Java code that is provided gives us the following:
 The ten-fold crossvalidation setup is for each sense distinction based on all available instances (i.e. the training set sizes vary). It computes precision, recall and F1 score for each sense distinction crossvalidation experiment.
 An extremely parsimonious feature extraction class – FeatureExtractorLetterLeft – is provided. It only extracts the letter immediately to the left as a feature for a token.FeatureExtractorPosLetterLeft in addition extracts the pos tag for the token immediately to the left. (Of course, a lot of useful information escapes these schemes.) FeatureExtractorLetterLeftgives this result. As the result depends on a random partitioning of the data into ten folds, each run will give slightly different results.
The assignment consists in doing the following:
 Evaluation may also be based on training sets of equal size. Overall evaluation metrics may also be based on the totality of token level outcomes. Contrast these evaluation choices with the ones made in the existing implementation! Which evaluation metrics are most interesting (in what context)? Furthermore, modify the implementation to allow evaluation metrics to be computed also in these ways.
 Try to find and implement the best pos-tag-based collocational (pos at a certain relative position) feature extraction scheme.
 Try to find and implement a better collocational (something at a certain relative position) feature extraction scheme.
 In relation to the these extractions schemes, are there any sense distinctions that are unusually easy or difficult to predict? Is it possible to explain this with the help of a linguistics-based analysis of what is reasonable to expect from behaviour of the lemma?
 Try to find and implement the best bag-of-lemmas and bag-of-word-forms feature extraction scheme and compare the two.
Grading
In order to pass this assignment, you will need to: submit solutions to all points above, document them, and support your conclusions in a way that shows that you understand the empirical methods being used.

Do you want your assignment written by the best essay experts? Order now, for an amazing discount.

How to Place an Order

Place an Order

Send the assignment details such as the instructions, due date/deadline, number of pages and college level to the customer support agent online on live chat, fill in the assignment details at place an order or send the information to our email address premieredtutorials@gmail.com and a customer support agent will respond to you immediately.

Assignment is Assigned to Writer

Once you place your order, we choose for you the best and competent writer for your assignment based on each writer’s competence in handling a subject.

Quality Check

When the homework is completed, we have a quality assurance team that proofreads the assignment to ensure it meets the required rubric instructions from your professor.

Delivery

After thorough review of your assignment, we send the paper to the client. In case you need any changes at this point, you can let us know so that we can handle it for you at no extra charge.

We deliver Papers

Prices as low as $10 per page

Why we should write your Paper

Money Return guarantee
0% Plagiarism Rate
Guaranteed Privacy
Written from scratch by highly qualified writers
Communication at Any Time (24/7)
Flexible Pricing and Great Discount Programs
Timely Deliveries
Free Amendments

Looking for a similar assignment and in urgent need for help? Place your order and have excellent work written by our team of professionals to ensure you acquire the best grades.

We are here to assist you.

Statistics about Us

130 New Projects
235 Projects in Progress
315 Inquiries
420 Repeat clients

All papers inclusive of research material are strictly intended to be used for research and study purposes only. Premiered Tutorials does not support or condone plagiarism in any form. These custom papers should be used with proper reference.