-
Notifications
You must be signed in to change notification settings - Fork 175
Topic recognition #148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kaia-santosha
wants to merge
28
commits into
shakes76:main
Choose a base branch
from
kaia-santosha:topic-recognition
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Topic recognition #148
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Added Title and overview explanation (description, what problem it solves, etc.) Added headings for future work on README
All these files are empty but I have created them just so I can go in and edit each one as I progress through the project. My first goal will be to preprocess the dataset in dataset.py
Added a forewarning so markers know why there may be .ipynb files in my future commits
…ataset I have created a function that returns an adjacency matrix that is created by analysing the edge links depicted in the musae_facebook_edges.csv
Added a normalisation function to get the adjacency matrix in the correct form for the GCN algorithm to work
The features for each node are inconsistent in their quantity, thus I have made a function to convert the feature vector for each node into a bag of word vector so all nodes have equal num of features. If a word is part of a nodes feature list it will be indicated by a 1 and if its not it will be indicated by a 0
…an be used in model
since I have seperated the features and labels I have created a train test split for a nodeId list thus can get the train and test features and labels by the ids in the train_ids and test_ids
t-SNE plot created to visualize the initial, high-dimensional node features in a 2D space, giving insights into their structure and relationships prior to any transformation by the GCN.
This function will be called from train.py in order to access the tensors of the preprocessed data required to be inputted in the model
…ed to research and practicing making my own. Been a while since last commit, spent researching quite a few projects online for GCN, I attempt to make my own. It is simple for the sake of getting a baseline. I may increase model complexity later on.
needed to import torch and set device for some functionality to work
no hyperparameter tuning yet, just setting some values just to guage whether the model actually trains or not. Hyperparameter tuning will come later when I do cross validation
Created to code to tune hyperparameters via 10-fold nested cross validation. 10 folds were chosen because the dataset is large thus we can afford to train with 90% train and only 10% test sets.
This took an immense time but it was worthwile as I got an indication on the possible best hyperparameters. Since I intend on modifying my model to more align with the gcn code shown in the model exhibition lecture, my hyperparameters may change. However I still think it was worthwile testing the nested cross validation method as I can reuse it to tune the hyperparameters of my final model.
… Exhibition CON session Though my previous model had good performance, I wanted to experiment and hopefully settle on a new model architecture that better incorporates the techniques learned in the model exhibition CON session
Added the evaluation loop for the model in predict.py, I also added TSNE visualisation after the model is trained to see how well the model has performed. I still have to integrate it properly with train.py by changing some of the train.py code to be modular (in functions) so they can be called by the predict script.
I attempted to commit a local copy of the adjacency_matrix.npy file but since it is 3.9GB it is too large for github. Instead I have added a link to the downloadable file on google drive in the README
fixed some errors in the code namely the return for the train script to the predict script. Also added more detail to the README
wasnt sure which folder should contain the README so Ive put in in both just to be sure
Commented files and double checked if everything is in order
|
This is an initial inspection, no action is required at this point |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.