Original README of the forked project is below this line

Using PubMed trained Word2Vec word embeddings to see if known gene-disease relationships are reliably encoded in vector space. Code in this directory is forked from https://github.com/spyysalo/wvlib but TBH the command line interface to do this analysis was an early idea that I basically abonded. Instead I started a notebook "Vector similarity.ipynb" which does the analogy task in demo form.

The embeddings I used came from http://evexdb.org/pmresources/vec-space-models/ and are multi-GB files that are not under source code control. You will need to download wikipedia-pubmed-and-PMC-w2v.bin and PubMed-w2v.bin to get the notebook to work locally for you.

2020-02-22 jfleischer@ucsd.edu

Original README of the forked project is below this line

wvlib - word vector library

Work in progress, not currently recommended for any use.

Try the following:

Find 10 words closest to "protein" using word2vec vectors induced on the text8 demo data

echo protein | python nearest.py text8.tar.gz -n 10

Find word that has the same relationship to "japan" as "paris" has to "france"

echo 'france paris japan' | python analogy.py text8.tar.gz -q -n 1

Evaluate the vectors on the binary classification task using words from McIntosh and Curran "Reducing semantic drift with bagging and distributional similarity" (ACL 2009)

python evalclass.py text8.tar.gz word-classes/McIC-09/*.txt

Evaluate the vectors on the closed-class member retrieval task using the set of standard amino acids

python evalset.py text8.tar.gz word-sets/Ohta-bio-sets/standard-amino-acids.txt

The rest of this README is TODO. See scripts for documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
compat		compat
tools		tools
word-classes		word-classes
word-sets/Ohta-bio-sets		word-sets/Ohta-bio-sets
word-similarities		word-similarities
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Vector similarity.ipynb		Vector similarity.ipynb
__init__.py		__init__.py
addlanguage.py		addlanguage.py
analogy.py		analogy.py
common.py		common.py
convert.py		convert.py
dbscan.py		dbscan.py
eval-all-classes.sh		eval-all-classes.sh
eval-all-ranks.sh		eval-all-ranks.sh
evalclass.py		evalclass.py
evalrank.py		evalrank.py
evalset.py		evalset.py
kmeans.py		kmeans.py
nearest.py		nearest.py
pairdist.py		pairdist.py
relation.py		relation.py
similarity.py		similarity.py
text8.tar.gz		text8.tar.gz
wvlib.py		wvlib.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Original README of the forked project is below this line

wvlib - word vector library

About

Uh oh!

Releases

Packages

Languages

License

FleischerResearchLab/wvlib

Folders and files

Latest commit

History

Repository files navigation

Original README of the forked project is below this line

wvlib - word vector library

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages