## You can also run the notebook in [COLAB](https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/tutorials/02_deeppavlov_ner.ipynb).

In [None]:
!pip3 install deeppavlov

# Recognize named entities on news data with CNN

In this tutorial, you will use a convolutional neural network to solve Named Entity Recognition (NER) problem. NER is a common task in natural language processing systems. It serves for extraction such entities from the text as persons, organizations, locations, etc. In this task you will experiment to recognize named entities in different news from common CoNLL-2003 dataset.

For example, we want to extract persons' and organizations' names from the text. Then for the input text:

    Yan Goodfellow works for Google Brain

a NER model needs to provide the following sequence of tags:

    B-PER I-PER    O     O   B-ORG  I-ORG

Where *B-* and *I-* prefixes stand for the beginning and inside of the entity, while *O* stands for out of tag or no tag. Markup with the prefix scheme is called *BIO markup*. This markup is introduced for distinguishing of consequent entities with similar types.

A solution of the task will be based on neural networks, particularly, on Convolutional Neural Networks.

### Data

The following cell will download all data required for this assignment into the folder `/data`. The download util from the library is used to download and extract the archive.

In [1]:
import deeppavlov
from deeppavlov.core.data.utils import download_decompress
download_decompress('http://files.deeppavlov.ai/deeppavlov_data/conll2003_v2.tar.gz', 'data/')

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
[nltk_data] Downloading package punkt to /home/mikhail/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/mikhail/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package perluniprops to
[nltk_data]     /home/mikhail/nltk_data...
[nltk_data]   Package perluniprops is already up-to-date!
[nltk_data] Downloading package nonbreaking_prefixes to
[nltk_data]     /home/mikhail/nltk_data...
[nltk_data]   Package nonbreaking_prefixes is already up-to-date!
2018-06-27 12:30:29.760 DEBUG in 'gensim.models.doc2vec'['doc2vec'] at line 73: Fast version of gensim.models.doc2vec is being used
2018-06-27 12:30:29.764 INFO in 'summa.preprocessing.cleaner'['textcleaner'] at line 20: 'pattern' package not found; tag filters are not available for English
2018-06-27 12:30:34.248 DEBUG in 'urllib3

### Load the CoNLL-2003 Named Entity Recognition corpus

We will work with a corpus, which contains twits with NE tags. Typical file with NER data contains lines with pairs of tokens (word/punctuation symbol) and tags, separated by a whitespace. In many cases additional information such as POS tags included between  Different documents are separated by lines **started** with **-DOCSTART-** token. Different sentences are separated by an empty line. Example

    -DOCSTART- -X- -X- O

    EU NNP B-NP B-ORG
    rejects VBZ B-VP O
    German JJ B-NP B-MISC
    call NN I-NP O
    to TO B-VP O
    boycott VB I-VP O
    British JJ B-NP B-MISC
    lamb NN I-NP O
    . . O O

    Peter NNP B-NP B-PER
    Blackburn NNP I-NP I-PER

In this tutorial we will focus only on tokens and tags (first and last elements of the line) and drop POS information located in between.

We start with using the *Conll2003DatasetReader* class that provides functionality for reading the dataset. It returns a dictionary with fields *train*, *test*, and *valid*. At each field a list of samples is stored. Each sample is a tuple of tokens and tags. Both tokens and tags are lists. The following example depicts the structure that should be returned by *read* method:

    {'train': [(['Mr.', 'Dwag', 'are', 'derping', 'around'], ['B-PER', 'I-PER', 'O', 'O', 'O']), ....],
     'valid': [...],
     'test': [...]}

There are three separate parts of the dataset:
 - *train* data for training the model;
 - *validation* data for evaluation and hyperparameters tuning;
 - *test* data for final evaluation of the model.
 

Each of these parts is stored in a separate txt file.

We will use [Conll2003DatasetReader](https://github.com/deepmipt/DeepPavlov/blob/master/deeppavlov/dataset_readers/conll2003_reader.py) from the library to read the data from text files to the format described above.

In [2]:
from deeppavlov.dataset_readers.conll2003_reader import Conll2003DatasetReader
dataset = Conll2003DatasetReader().read('data/')

You should always understand what kind of data you deal with. For this purpose, you can print the data running the following cell:

In [3]:
for sample in dataset['train'][:4]:
    for token, tag in zip(*sample):
        print('%s\t%s' % (token, tag))
    print()

EU	B-ORG
rejects	O
German	B-MISC
call	O
to	O
boycott	O
British	B-MISC
lamb	O
.	O

Peter	B-PER
Blackburn	I-PER

BRUSSELS	B-LOC
1996-08-22	O

The	O
European	B-ORG
Commission	I-ORG
said	O
on	O
Thursday	O
it	O
disagreed	O
with	O
German	B-MISC
advice	O
to	O
consumers	O
to	O
shun	O
British	B-MISC
lamb	O
until	O
scientists	O
determine	O
whether	O
mad	O
cow	O
disease	O
can	O
be	O
transmitted	O
to	O
sheep	O
.	O



### Prepare dictionaries

To train a neural network, we will use two mappings: 
- {token}$\to${token id}: address the row in embeddings matrix for the current token;
- {tag}$\to${tag id}: one-hot ground truth probability distribution vectors for computing the loss at the output of the network.

Token indices will be used to address the row in embeddings matrix. The mapping for tags will be used to create one-hot ground truth probability distribution vectors to compute the loss at the output of the network.

The [SimpleVocabulary](https://github.com/deepmipt/DeepPavlov/blob/master/deeppavlov/core/data/simple_vocab.py) implemented in the library will be used to perform those mappings.

In [4]:
from deeppavlov.core.data.simple_vocab import SimpleVocabulary

Now we need to build dictionaries for tokens and tags. Sometimes there are special tokens in vocabularies, for instance an unknown word token, which is used every time we encounter out of vocabulary word. In our case the only special token will be`<UNK>` for out of vocabulary words.

In [61]:
special_tokens = ['<UNK>']

token_vocab = SimpleVocabulary(special_tokens, save_path='model/token.dict')
tag_vocab = SimpleVocabulary(save_path='model/tag.dict')



Lets fit the vocabularies on the train part of the data.

In [62]:
all_tokens_by_sentences = [tokens for tokens, tags in dataset['train']]
all_tags_by_sentences = [tags for tokens, tags in dataset['train']]

token_vocab.fit(all_tokens_by_sentences)
tag_vocab.fit(all_tags_by_sentences)


Try to get the indices. Keep in mind that we are working with batches of the following structure:
    
    [['utt0_tok0', 'utt1_tok1', ...], ['utt1_tok0', 'utt1_tok1', ...], ...]

In [63]:
token_vocab([['How', 'to', 'do', 'a', 'barrel', 'roll', '?']])

[[10167, 6, 168, 7, 6097, 5518, 1865]]

In [64]:
tag_vocab([['O', 'O', 'O'], ['B-ORG', 'I-ORG']])

[[0, 0, 0], [3, 5]]

Now we will try conversion from indices to tokens.

In [65]:
import numpy as np
token_vocab([np.random.randint(0, 512, size=10)])

[['into',
  'another',
  'CHICAGO',
  'capital',
  'But',
  'Wednesday',
  '20',
  '2',
  'into',
  'years']]

### Dataset Iterator

Neural Networks are usually trained with batches. It means that weight updates of the network are based on several sequences at every single time. The tricky part is that all sequences within a batch need to have the same length. So we will pad them with a special `<UNK>` token. Likewise tokens tags also must be padded It is also a good practice to provide RNN with sequence lengths, so it can skip computations for padding parts. We provide the batching function *batches_generator* readily available for you to save time. 

An important concept in the batch generation is shuffling. Shuffling is taking sample from the dataset at random order. It is important to train on the shuffled data because large number consequetive samples of the same class may result in pure quality of the model.

In [66]:
from deeppavlov.core.data.data_learning_iterator import DataLearningIterator

Create the dataset iterator from the loaded dataset

In [67]:
data_iterator = DataLearningIterator(dataset)

Try it out:

In [70]:
next(data_iterator.gen_batches(2, shuffle=True))

((['Corinthians', '1', 'Guarani', '0'],
  ['The',
   'Richmond-based',
   'retailer',
   'lost',
   '$',
   '95.7',
   'million',
   'in',
   'the',
   'fiscal',
   'year',
   'ended',
   'February',
   '3',
   '.']),
 (['B-ORG', 'O', 'B-ORG', 'O'],
  ['O',
   'B-MISC',
   'O',
   'O',
   'O',
   'O',
   'O',
   'O',
   'O',
   'O',
   'O',
   'O',
   'O',
   'O',
   'O']))

### Masking

The last thing about generating training data. We need to produce a binary mask which is one where tokens present and zero elsewhere. This mask will stop backpropagation through paddings. An instance of such mask:

    [[1, 1, 0, 0, 0],
     [1, 1, 1, 1, 1]]
 For the sentences in batch:

     [['The', 'roof'],
      ['This', 'is', 'my', 'domain', '!']]

The mask length must be equal to the maximum length of the sentence in the batch.

In [71]:
from deeppavlov.models.preprocessors.mask import Mask
get_mask = Mask()

Try it out:

In [72]:
get_mask([['Try', 'to', 'get', 'the', 'mask'], ['Check', 'paddings']])

array([[1., 1., 1., 1., 1.],
       [1., 1., 0., 0., 0.]], dtype=float32)

## Build a recurrent neural network

This is the most important part of the assignment. Here we will specify the network architecture based on TensorFlow building blocks. It's fun and easy as a lego constructor! We will create an Convolutional Neural Network (CNN) network which will produce probability distribution over tags for each token in a sentence. To take into account both right and left contexts of the token, we will use CNN. Dense layer will be used on top to perform tag classification.  

In [73]:
import tensorflow as tf
import numpy as np

np.random.seed(42)
tf.set_random_seed(42)

An essential part of almost every network in NLP domain is embeddings of the words. We pass the text to the network as a series of tokens. Each token is represented by its index. For every token (index) we have a vector. In total the vectors form an embedding matrix. This matrix can be either pretrained using some common algorithm like Skip-Gram or CBOW or it can be initialized by random values and trained along with other parameters of the network. In this tutorial we will follow the second alternative.

We need to build a function that takes the tensor of token indices with shape [batch_size, num_tokens] and for each index in this matrix it retrieves a vector from the embedding matrix, corresponding to that index. That results in a new tensor with sahpe [batch_size, num_tokens, emb_dim].

In [74]:
def get_embeddings(indices, vocabulary_size, emb_dim):
    # Initialize the random gaussian matrix with dimensions [vocabulary_size, embedding_dimension]
    # The **VARIANCE** of the random samples must be 1 / embedding_dimension
    emb_mat = np.random.randn(vocabulary_size, emb_dim).astype(np.float32) / np.sqrt(emb_dim) # YOUR CODE HERE
    emb_mat = tf.Variable(emb_mat, name='Embeddings', trainable=True)
    emb = tf.nn.embedding_lookup(emb_mat, indices)
    return emb

The body of the network is the convolutional layers. The basic idea behind convolutions is to apply the same dense layer to every n consecutive samples (tokens in our case). A simplified case is depicted below.

<img src="img/convolution.png" width="400">

Here number of input and output features equal to 1.

Lets try it on a toy example:

In [75]:
# Create a tensor with shape [batch_size, number_of_tokens, number_of_features]
x = tf.random_normal(shape=[2, 10, 100])
y = tf.layers.conv1d(x, filters=200, kernel_size=8)
print(y)

Tensor("conv1d_6/BiasAdd:0", shape=(2, 3, 200), dtype=float32)


As you can see due to the abscence of zero padding (zeros on in the beginning and in the end of input) the size of resulting tensor along the token dimension is reduced. To use padding and preserve the dimensionality along the convolution dimension pass padding='same' parameter to the function.

In [76]:
y_with_padding = tf.layers.conv1d(x, filters=200, kernel_size=8, padding='same')
print(y_with_padding)

Tensor("conv1d_7/BiasAdd:0", shape=(2, 10, 200), dtype=float32)


Now stack a number of layers with dimensionality given in n_hidden_list

In [77]:
def conv_net(units, n_hidden_list, cnn_filter_width, activation=tf.nn.relu):
    # Use activation(units) to apply activation to units
    for n_hidden in n_hidden_list:
        
        units = tf.layers.conv1d(units,
                                 n_hidden,
                                 cnn_filter_width,
                                 padding='same')
        units = activation(units)
    return units
    

A common loss for the classification task is cross-entropy. Why classification? Because for each token the network must decide which tag to predict. The cross-entropy has the following form:

$$ H(P, Q) = -E_{x \sim P} log Q(x) $$

It measures the dissimilarity between the ground truth distribution over the classes and predicted distribution. In the most of the cases ground truth distribution is one-hot. Luckily this loss is already [implemented](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2) in TensorFlow.

In [78]:
# The logits
l = tf.random_normal([1, 4, 3]) # shape [batch_size, number_of_tokens, number of classes]
indices = tf.placeholder(tf.int32, [1, 4])

# Make one-hot distribution from indices for 3 types of tag
p = tf.one_hot(indices, depth=3)
loss_tensor = tf.nn.softmax_cross_entropy_with_logits_v2(labels=p, logits=l)
print(loss_tensor)

Tensor("softmax_cross_entropy_with_logits_3/Reshape_2:0", shape=(1, 4), dtype=float32)


All sentences in the batch have same length and we pad the each sentence to the maximal lendth. So there are paddings at the end and pushing the network to predict those paddings usually results in deteriorated quallity. Then we need to multiply the loss tensor by binary mask to prevent gradient flow from the paddings.

In [79]:
mask = tf.placeholder(tf.float32, shape=[1, 4])
loss_tensor *= mask

The last step to do is to compute the mean value of the loss tensor:

In [80]:
loss = tf.reduce_mean(loss_tensor)

Now define your own function that returns a scalar masked cross-entropy loss

In [81]:
def masked_cross_entropy(logits, label_indices, number_of_tags, mask):
    ground_truth_labels = tf.one_hot(label_indices, depth=number_of_tags)
    loss_tensor = tf.nn.softmax_cross_entropy_with_logits_v2(labels=ground_truth_labels, logits=logits)
    loss_tensor *= mask
    loss = tf.reduce_mean(loss_tensor)
    return loss

Put everything into a class:

In [87]:
import numpy as np
import tensorflow as tf

class NerNetwork:
    def __init__(self,
                 n_tokens,
                 n_tags,
                 token_emb_dim=100,
                 n_hidden_list=(128,),
                 cnn_filter_width=7,
                 use_batch_norm=False,
                 embeddings_dropout=False,
                 top_dropout=False,
                 **kwargs):
        
        # ================ Building inputs =================
        
        self.learning_rate_ph = tf.placeholder(tf.float32, [])
        self.dropout_keep_ph = tf.placeholder(tf.float32, [])
        self.token_ph = tf.placeholder(tf.int32, [None, None], name='token_ind_ph')
        self.mask_ph = tf.placeholder(tf.float32, [None, None], name='Mask_ph')
        self.y_ph = tf.placeholder(tf.int32, [None, None], name='y_ph')
        
        # ================== Building the network ==================
        
        # Now embedd the indices of tokens using token_emb_dim function
        
        ######################################
        ########## YOUR CODE HERE ############
        emb = get_embeddings(self.token_ph, n_tokens, token_emb_dim)
        ######################################

        emb = tf.nn.dropout(emb, self.dropout_keep_ph, (tf.shape(emb)[0], 1, tf.shape(emb)[2]))
        
        # Build a multilayer CNN on top of the embeddings.
        # The number of units in the each layer must match
        # corresponding number from n_hidden_list.
        # Use ReLU activation 
        ######################################
        ########## YOUR CODE HERE ############
        units = conv_net(emb, n_hidden_list, cnn_filter_width)
        ######################################
        units = tf.nn.dropout(units, self.dropout_keep_ph, (tf.shape(units)[0], 1, tf.shape(units)[2]))
        logits = tf.layers.dense(units, n_tags, activation=None)
        self.predictions = tf.argmax(logits, 2)
        
        # ================= Loss and train ops =================
        # Use cross-entropy loss. check the tf.nn.softmax_cross_entropy_with_logits_v2 function
        ######################################
        ########## YOUR CODE HERE ############
        self.loss = masked_cross_entropy(logits, self.y_ph, n_tags, self.mask_ph)
        ######################################

        # Create a training operation to update the network parameters.
        # We purpose to use the Adam optimizer as it work fine for the
        # most of the cases. Check tf.train to find an implementation.
        # Put the train operation to the attribute self.train_op
        
        ######################################
        ########## YOUR CODE HERE ############
        optimizer = tf.train.AdamOptimizer(self.learning_rate_ph)
        self.train_op = optimizer.minimize(self.loss)
        ######################################

        # ================= Initialize the session =================
        
        self.sess = tf.Session()
        self.sess.run(tf.global_variables_initializer())

    def __call__(self, tok_batch, mask_batch):
        feed_dict = {self.token_ph: tok_batch,
                     self.mask_ph: mask_batch,
                     self.dropout_keep_ph: 1.0}
        return self.sess.run(self.predictions, feed_dict)

    def train_on_batch(self, tok_batch, tag_batch, mask_batch, dropout_keep_prob, learning_rate):
        feed_dict = {self.token_ph: tok_batch,
                     self.y_ph: tag_batch,
                     self.mask_ph: mask_batch,
                     self.dropout_keep_ph: dropout_keep_prob,
                     self.learning_rate_ph: learning_rate}
        self.sess.run(self.train_op, feed_dict)


Now create an instance of the NerNetwork class:

In [88]:
nernet = NerNetwork(len(token_vocab),
                    len(tag_vocab),
                    n_hidden_list=[100, 100])

Regularly we want to check the score on validation part of the dataset every epoch. In the most of the cases of NER tasks the classes are imbalanced. And the accuray is not the best measure of performance. If we have 95% of 'O' tags, than the silly classifier, that always predicts '0' get 95% accuracy. To tackle this issue the F1-score is used. The F1-score can be defined as:

$$ F1 =  \frac{2 P R}{P + R}$$ 

where P is precision and R is recall.

Lets write the evaluation function. We need to get all predictions for the given part of the dataset and compute F1.

In [89]:
from deeppavlov.models.ner.evaluation import precision_recall_f1
# The function precision_recall_f1 takes two lists: y_true and y_predicted
# the tag sequences for each sentences should be merged into one big list 
from deeppavlov.core.data.utils import zero_pad
# zero_pad takes a batch of lists of token indices, pad it with zeros to the
# maximal length and convert it to numpy matrix
from itertools import chain


def eval_valid(network, batch_generator):
    total_true = []
    total_pred = []
    for x, y_true in batch_generator:

        # Prepare token indices from tokens batch
        x_inds = token_vocab(x) # YOUR CODE HERE

        # Pad the indices batch with zeros
        x_batch = zero_pad(x_inds) # YOUR CODE HERE

        # Get the mask using get_mask
        mask = get_mask(x) # YOUR CODE HERE
        
        # We call the instance of the NerNetwork because we have defined __call__ method
        y_inds = network(x_batch, mask)

        # For every sentence in the batch extract all tags up to paddings
        y_inds = [y_inds[n][:len(x[n])] for n, y in enumerate(y_inds)] # YOUR CODE HERE
        y_pred = tag_vocab(y_inds)

        # Add fresh predictions 
        total_true.extend(chain(*y_true))
        total_pred.extend(chain(*y_pred))
    res = precision_recall_f1(total_true, total_pred, print_results=True)

Set hyperparameters. You might want to start with the following recommended values:
- *batch_size*: 32;
- n_epochs: 10;
- starting value of *learning_rate*: 0.001
- *learning_rate_decay*: a square root of 2;
- *dropout_keep_probability* equal to 0.7 for training (typical values for dropout probability are ranging from 0.3 to 0.9).

A very efficient technique for the learning rate managment is dropping learning rate after convergence. It is common to use dividers 2, 3, and 10 to drop the learning rate.

In [90]:
batch_size = 16 # YOUR HYPERPARAMETER HERE
n_epochs = 20 # YOUR HYPERPARAMETER HERE
learning_rate = 0.001 # YOUR HYPERPARAMETER HERE
dropout_keep_prob = 0.5 # YOUR HYPERPARAMETER HERE

Now we iterate through dataset batch by batch and pass the data to the train op

In [91]:
for epoch in range(n_epochs):
    for x, y in data_iterator.gen_batches(batch_size, 'train'):
        # Convert tokens to indices via Vocab
        x_inds = token_vocab(x) # YOUR CODE 
        # Convert tags to indices via Vocab
        y_inds = tag_vocab(y) # YOUR CODE 
        
        # Pad every sample with zeros to the maximal length
        x_batch = zero_pad(x_inds)
        y_batch = zero_pad(y_inds)

        mask = get_mask(x)
        nernet.train_on_batch(x_batch, y_batch, mask, dropout_keep_prob, learning_rate)
    print('Evaluating the model on valid part of the dataset')
    eval_valid(nernet, data_iterator.gen_batches(batch_size, 'valid'))


Evaluating the model on valid part of the dataset


2018-06-27 13:45:24.341 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5465 phrases; correct: 3397.

precision:  62.16%; recall:  57.17%; FB1:  59.56

	LOC: precision:  64.94%; recall:  76.65%; F1:  70.31 2168

	MISC: precision:  51.44%; recall:  19.41%; F1:  28.19 348

	ORG: precision:  50.16%; recall:  46.61%; F1:  48.32 1246

	PER: precision:  69.58%; recall:  64.33%; F1:  66.85 1703




Evaluating the model on valid part of the dataset


2018-06-27 13:45:27.357 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5615 phrases; correct: 4447.

precision:  79.20%; recall:  74.84%; FB1:  76.96

	LOC: precision:  86.10%; recall:  84.00%; F1:  85.04 1792

	MISC: precision:  67.88%; recall:  68.76%; F1:  68.32 934

	ORG: precision:  75.43%; recall:  61.82%; F1:  67.95 1099

	PER: precision:  80.50%; recall:  78.23%; F1:  79.35 1790




Evaluating the model on valid part of the dataset


2018-06-27 13:45:30.326 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5387 phrases; correct: 4584.

precision:  85.09%; recall:  77.15%; FB1:  80.93

	LOC: precision:  89.46%; recall:  85.90%; F1:  87.64 1764

	MISC: precision:  85.34%; recall:  73.86%; F1:  79.19 798

	ORG: precision:  80.15%; recall:  70.77%; F1:  75.17 1184

	PER: precision:  83.85%; recall:  74.70%; F1:  79.01 1641




Evaluating the model on valid part of the dataset


2018-06-27 13:45:33.277 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5436 phrases; correct: 4702.

precision:  86.50%; recall:  79.13%; FB1:  82.65

	LOC: precision:  89.44%; recall:  86.72%; F1:  88.06 1781

	MISC: precision:  88.02%; recall:  75.70%; F1:  81.40 793

	ORG: precision:  83.25%; recall:  71.14%; F1:  76.72 1146

	PER: precision:  84.91%; recall:  79.10%; F1:  81.90 1716




Evaluating the model on valid part of the dataset


2018-06-27 13:45:36.266 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5298 phrases; correct: 4688.

precision:  88.49%; recall:  78.90%; FB1:  83.42

	LOC: precision:  92.71%; recall:  85.90%; F1:  89.18 1702

	MISC: precision:  89.10%; recall:  77.98%; F1:  83.17 807

	ORG: precision:  82.95%; recall:  75.09%; F1:  78.83 1214

	PER: precision:  87.87%; recall:  75.14%; F1:  81.01 1575




Evaluating the model on valid part of the dataset


2018-06-27 13:45:39.228 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5339 phrases; correct: 4705.

precision:  88.13%; recall:  79.18%; FB1:  83.41

	LOC: precision:  90.68%; recall:  86.88%; F1:  88.74 1760

	MISC: precision:  86.87%; recall:  78.20%; F1:  82.31 830

	ORG: precision:  83.80%; recall:  74.42%; F1:  78.83 1191

	PER: precision:  89.22%; recall:  75.46%; F1:  81.76 1558




Evaluating the model on valid part of the dataset


2018-06-27 13:45:42.213 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5412 phrases; correct: 4789.

precision:  88.49%; recall:  80.60%; FB1:  84.36

	LOC: precision:  93.45%; recall:  86.94%; F1:  90.07 1709

	MISC: precision:  89.57%; recall:  79.18%; F1:  84.05 815

	ORG: precision:  81.04%; recall:  76.81%; F1:  78.87 1271

	PER: precision:  88.56%; recall:  77.74%; F1:  82.80 1617




Evaluating the model on valid part of the dataset


2018-06-27 13:45:45.169 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5388 phrases; correct: 4763.

precision:  88.40%; recall:  80.16%; FB1:  84.08

	LOC: precision:  91.84%; recall:  88.19%; F1:  89.98 1764

	MISC: precision:  87.17%; recall:  78.85%; F1:  82.80 834

	ORG: precision:  82.20%; recall:  75.09%; F1:  78.49 1225

	PER: precision:  90.03%; recall:  76.49%; F1:  82.71 1565




Evaluating the model on valid part of the dataset


2018-06-27 13:45:48.170 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5345 phrases; correct: 4722.

precision:  88.34%; recall:  79.47%; FB1:  83.67

	LOC: precision:  92.03%; recall:  87.43%; F1:  89.67 1745

	MISC: precision:  88.51%; recall:  79.39%; F1:  83.70 827

	ORG: precision:  80.78%; recall:  75.84%; F1:  78.23 1259

	PER: precision:  90.29%; recall:  74.21%; F1:  81.47 1514




Evaluating the model on valid part of the dataset


2018-06-27 13:45:51.116 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5273 phrases; correct: 4716.

precision:  89.44%; recall:  79.37%; FB1:  84.10

	LOC: precision:  92.76%; recall:  87.21%; F1:  89.90 1727

	MISC: precision:  90.83%; recall:  79.50%; F1:  84.79 807

	ORG: precision:  82.51%; recall:  75.99%; F1:  79.11 1235

	PER: precision:  90.56%; recall:  73.94%; F1:  81.41 1504




Evaluating the model on valid part of the dataset


2018-06-27 13:45:54.39 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5183 phrases; correct: 4632.

precision:  89.37%; recall:  77.95%; FB1:  83.27

	LOC: precision:  93.43%; recall:  85.90%; F1:  89.51 1689

	MISC: precision:  90.86%; recall:  79.83%; F1:  84.99 810

	ORG: precision:  83.84%; recall:  75.47%; F1:  79.43 1207

	PER: precision:  88.42%; recall:  70.90%; F1:  78.70 1477




Evaluating the model on valid part of the dataset


2018-06-27 13:45:56.988 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5162 phrases; correct: 4587.

precision:  88.86%; recall:  77.20%; FB1:  82.62

	LOC: precision:  92.82%; recall:  85.90%; F1:  89.23 1700

	MISC: precision:  90.65%; recall:  79.93%; F1:  84.96 813

	ORG: precision:  82.99%; recall:  74.94%; F1:  78.76 1211

	PER: precision:  88.11%; recall:  68.78%; F1:  77.26 1438




Evaluating the model on valid part of the dataset


2018-06-27 13:45:59.925 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5220 phrases; correct: 4630.

precision:  88.70%; recall:  77.92%; FB1:  82.96

	LOC: precision:  93.66%; recall:  86.12%; F1:  89.73 1689

	MISC: precision:  90.63%; recall:  79.72%; F1:  84.82 811

	ORG: precision:  82.79%; recall:  76.06%; F1:  79.28 1232

	PER: precision:  86.90%; recall:  70.20%; F1:  77.66 1488




Evaluating the model on valid part of the dataset


2018-06-27 13:46:02.877 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5220 phrases; correct: 4650.

precision:  89.08%; recall:  78.26%; FB1:  83.32

	LOC: precision:  93.30%; recall:  86.45%; F1:  89.74 1702

	MISC: precision:  90.98%; recall:  79.83%; F1:  85.04 809

	ORG: precision:  82.74%; recall:  75.09%; F1:  78.73 1217

	PER: precision:  88.40%; recall:  71.61%; F1:  79.12 1492




Evaluating the model on valid part of the dataset


2018-06-27 13:46:05.850 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5220 phrases; correct: 4678.

precision:  89.62%; recall:  78.73%; FB1:  83.82

	LOC: precision:  92.90%; recall:  86.94%; F1:  89.82 1719

	MISC: precision:  90.42%; recall:  79.83%; F1:  84.79 814

	ORG: precision:  83.96%; recall:  74.94%; F1:  79.20 1197

	PER: precision:  89.93%; recall:  72.75%; F1:  80.43 1490




Evaluating the model on valid part of the dataset


2018-06-27 13:46:08.832 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5163 phrases; correct: 4599.

precision:  89.08%; recall:  77.40%; FB1:  82.83

	LOC: precision:  91.82%; recall:  86.77%; F1:  89.22 1736

	MISC: precision:  91.66%; recall:  79.83%; F1:  85.33 803

	ORG: precision:  85.16%; recall:  73.60%; F1:  78.96 1159

	PER: precision:  87.51%; recall:  69.60%; F1:  77.53 1465




Evaluating the model on valid part of the dataset


2018-06-27 13:46:11.711 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5200 phrases; correct: 4672.

precision:  89.85%; recall:  78.63%; FB1:  83.86

	LOC: precision:  93.62%; recall:  86.23%; F1:  89.77 1692

	MISC: precision:  89.32%; recall:  79.83%; F1:  84.31 824

	ORG: precision:  83.79%; recall:  75.17%; F1:  79.25 1203

	PER: precision:  90.75%; recall:  72.96%; F1:  80.89 1481




Evaluating the model on valid part of the dataset


2018-06-27 13:46:14.542 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5196 phrases; correct: 4645.

precision:  89.40%; recall:  78.17%; FB1:  83.41

	LOC: precision:  93.58%; recall:  85.68%; F1:  89.46 1682

	MISC: precision:  88.96%; recall:  79.50%; F1:  83.96 824

	ORG: precision:  83.78%; recall:  74.72%; F1:  78.99 1196

	PER: precision:  89.42%; recall:  72.53%; F1:  80.10 1494




Evaluating the model on valid part of the dataset


2018-06-27 13:46:17.420 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5140 phrases; correct: 4553.

precision:  88.58%; recall:  76.62%; FB1:  82.17

	LOC: precision:  93.59%; recall:  85.85%; F1:  89.55 1685

	MISC: precision:  90.83%; recall:  79.50%; F1:  84.79 807

	ORG: precision:  85.03%; recall:  74.57%; F1:  79.46 1176

	PER: precision:  84.44%; recall:  67.48%; F1:  75.02 1472




Evaluating the model on valid part of the dataset


2018-06-27 13:46:20.380 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 51362 tokens with 5942 phrases; found: 5196 phrases; correct: 4666.

precision:  89.80%; recall:  78.53%; FB1:  83.79

	LOC: precision:  91.30%; recall:  88.02%; F1:  89.63 1771

	MISC: precision:  92.10%; recall:  79.61%; F1:  85.40 797

	ORG: precision:  85.65%; recall:  74.79%; F1:  79.86 1171

	PER: precision:  90.05%; recall:  71.23%; F1:  79.54 1457




Eval the model on test part now

In [92]:
eval_valid(nernet, data_iterator.gen_batches(batch_size, 'test'))

2018-06-27 13:46:35.397 DEBUG in 'deeppavlov.models.ner.evaluation'['evaluation'] at line 213: processed 46435 tokens with 5648 phrases; found: 4561 phrases; correct: 3738.

precision:  81.96%; recall:  66.18%; FB1:  73.23

	LOC: precision:  84.02%; recall:  82.25%; F1:  83.13 1633

	MISC: precision:  81.80%; recall:  71.08%; F1:  76.07 610

	ORG: precision:  81.25%; recall:  60.26%; F1:  69.20 1232

	PER: precision:  79.74%; recall:  53.56%; F1:  64.08 1086




Lets try to infer the model on our sentence:

In [42]:
sentence = 'Petr stole my vodka'
x = [sentence.split()]

x_inds = token_vocab(x)
x_batch = zero_pad(x_inds)
mask = get_mask(x)
y_inds = nernet(x_batch, mask)
print(x[0])
print(tag_vocab(y_inds)[0])

['Petr', 'stole', 'my', 'vodka']
['B-PER', 'O', 'O', 'O']
