Task
Semantic Relation Extraction and Classification in Scientific Papers
Subtask: 1 - Relation classification
1.1 Relation classification on clean data
1.2 Relation classification on noisy data
Classes - Semantic Relations
1 | relations = ['usage', 'result', 'model-feature', 'part-whole', 'topic', 'comparison'] |
Features
Lexical Features
| Features | Remarks | Value |
|---|---|---|
| L1 | Distance which shows the distances between entities | Int |
| L2 | hasIn(Model-Feature, Part-Whole) | int(0, 1) |
| L3 | hasOf(Topic, Result) | Int(0, 1) |
| L4 | hasFor(Usage) | Int(0, 1) |
| L5 | hasWith(Compare) | int(0, 1) |
| L6 | hasThan(Compare) | Int(0, 1) |
| L7 | hasAnd | Int(0, 1) |
| L8 | hasFrom | Int(0, 1) |
Entity Features
| Features | Remarks | Value |
|---|---|---|
| L1 | For comparison, it’s necessary to measure Similarity(sim200) | Float |
| L2 | Similarity Bucket | int(0, 1, 2, 3, 4) |
| L3 | Position of Entity (Text) | LabelEnocder (Text Index) |
| L4 | Start Entity | Index |
| L5 | End Entity | Index |
数据预处理
input format
1
2import numpy as np
np.array[[...feature_values...label],...]output format
.csv
Model Training
It seems to be better to use XGBoost as well as Scikit-Learning. In other words, use XGB.fit() rather then XGB.train().