Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5,279 changes: 5,279 additions & 0 deletions MyImmortalComplete (1).txt

Large diffs are not rendered by default.

Binary file added Project Statement.odt
Binary file not shown.
Empty file added ada.py
Empty file.
3 changes: 3 additions & 0 deletions frequency
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[(u'you', 159), (u'I', 132), (u'eh', 104), (u'love', 101), (u'the', 100), (u'me', 90), (u'by', 86), (u'ah', 81), (u'is', 73), (u'so', 71), (u'it', 68), (u'to', 65), (u"I'm", 65), (u'of', 53), (u'in', 51), (u"don't", 50), (u'lonely', 48), (u'about', 48), (u'oh', 47), (u'our', 43), (u'content', 42), (u'licensing', 42), (u'any', 42), (u'lyrics', 42), (u'provider', 42), (u'prohibited', 42), (u'and', 41), (u'come', 36), (u'wanna', 35), (u'like', 33), (u'my', 32), (u'just', 32), (u'let', 31), (u'go', 30), (u'get', 30), (u'be', 29), (u'your', 29), (u'a', 28), (u'over', 28), (u'nobody', 28), (u'do', 28), (u'neon', 28), (u'for', 28), (u'up', 28), (u'we', 27), (u'on', 27), (u'baby', 26), (u'cause', 26), (u'no', 26), (u'got', 23), (u'back', 23), (u'know', 22), (u'good', 21), (u'gotta', 20), (u"can't", 20), (u'mi', 19), (u'us', 19), (u'never', 18), (u'this', 18), (u'hold', 17), (u'that', 17), (u'take', 17), (u'falling', 16), (u'her', 16), (u'keep', 15), (u'bad', 15), (u'gonna', 15), (u'work', 14), (u'what', 14), (u'need', 14), (u'hey', 13), (u'uh', 13), (u'P', 12), (u'one', 12), (u'say', 12), (u'all', 11), (u'club', 11), (u'yeah', 11), (u'touch', 11), (u'sun', 10), (u'boy', 10), (u'think', 10), (u'are', 10), (u"it's", 10), (u'without', 10), (u'run', 10), (u'way', 10), (u'but', 10), (u'with', 10), (u'body', 9), (u'La', 9), (u'A', 9), (u'man', 9), (u'not', 9), (u'out', 9), (u'miss', 9), (u'stop', 9), (u'make', 9), (u'son', 9), (u'down', 9), (u'ha', 9), (u"ain't", 8), (u'now', 8), (u'living', 8), (u'show', 8), (u'fine', 8), (u'pretty', 8), (u'please', 8), (u'kiss', 8), (u'step', 8), (u'an', 8), (u'karma', 8), (u'danger', 7), (u'fly', 7), (u'can', 7), (u'hot', 7), (u'bang', 7), (u'they', 7), (u'hands', 7), (u'shut', 7), (u'too', 7), (u'face', 7), (u'tam', 6), (u'day', 6), (u'hood', 6), (u'lie', 6), (u'only', 6), (u'H', 6), (u'ban', 6), (u'nun', 6), (u'wants', 6), (u'situation', 6), (u'home', 6), (u'time', 6), (u'ape', 5), (u'ma', 5), (u'Le', 5), (u'goodbye', 5), (u'rock', 5), (u'feel', 5), (u'jar', 5), (u'killing', 5), (u"I've", 5), (u'been', 5), (u'will', 5), (u'some', 5), (u'bounce', 5), (u'when', 5), (u'stay', 5), (u'sorry', 4), (u'very', 4), (u'throw', 4), (u'music', 4), (u'more', 4), (u'how', 4), (u'bout', 4), (u'yo', 4), (u'doing', 4), (u'mine', 4), (u'tonight', 4), (u'see', 4), (u'said', 4), (u'if', 4), (u'lets', 4), (u'world', 4), (u'OK', 4), (u'right', 4), (u'o', 4), (u'Na', 4), (u'bitches', 4), (u'bun', 4), (u"let's", 4), (u'fell', 4), (u'alright', 4), (u'ever', 3), (u'through', 3), (u'singing', 3), (u'tell', 3), (u'want', 3), (u"won't", 3), (u'them', 3), (u'drop', 3), (u'em', 3), (u'girls', 3), (u'hips', 3), (u'find', 3), (u'feels', 3), (u'set', 3), (u'away', 3), (u'fire', 3), (u"what's", 3), (u'Hope', 3), (u'i', 3), (u'lips', 3), (u'Seoul', 3), (u'Rock', 3), (u'stage', 3), (u'into', 3), (u'was', 3), (u'boys', 3), (u"I'll", 3), (u'hit', 3), (u'dying', 3), (u'gold', 2), (u'young', 2), (u'wave', 2), (u'every', 2), (u'goes', 2), (u'here', 2), (u'makes', 2), (u'manna', 2), (u'from', 2), (u'two', 2), (u'getting', 2), (u'mu', 2), (u'beautiful', 2), (u'give', 2), (u'heard', 2), (u'hesitate', 2), (u"that's", 2), (u'better', 2), (u'fade', 2), (u'bye', 2), (u'break', 2), (u'looking', 2), (u'another', 2), (u'ring', 2), (u'painful', 2), (u'have', 2), (u'dancing', 2), (u'nothing', 2), (u'why', 2), (u'going', 2), (u'jab', 2), (u'handle', 2), (u'beat', 2), (u'X', 2), (u's', 2), (u'sky', 2), (u'damn', 2), (u'far', 2), (u'mind', 2), (u'chum', 2), (u'those', 2), (u'look', 2), (u'ugly', 2), (u'moment', 2), (u'had', 2), (u'real', 2), (u'big', 2), (u'mayo', 2), (u'game', 2), (u'five', 2), (u'bit', 2), (u'act', 2), (u'ladies', 2), (u'promise', 2), (u'long', 2), (u'gone', 2), (u'am', 2), (u'at', 2), (u'u', 2), (u'E', 2), (u'e', 2), (u'dance', 1), (u'dollar', 1), (u'yellow', 1), (u'four', 1), (u'baddest', 1), (u'song', 1), (u'rise', 1), (u'word', 1), (u'leave', 1), (u'sang', 1), (u'quick', 1), (u'round', 1), (u'sign', 1), (u'blowing', 1), (u'patient', 1), (u'alone', 1), (u'soldier', 1), (u'control', 1), (u'private', 1), (u'everybody', 1), (u'chi', 1), (u'shits', 1), (u'phone', 1), (u'stick', 1), (u'V', 1), (u'grill', 1), (u'tease', 1), (u'roof', 1), (u'end', 1), (u'sit', 1), (u'six', 1), (u'mess', 1), (u'watch', 1), (u'wrong', 1), (u'maybe', 1), (u'keeps', 1), (u'move', 1), (u'paper', 1), (u'style', 1), (u'then', 1), (u'L', 1), (u'name', 1), (u'always', 1), (u'lipstick', 1), (u'tempo', 1), (u'care', 1), (u'already', 1), (u'done', 1), (u'city', 1), (u'guess', 1), (u'paint', 1), (u'Nagasaki', 1), (u'karats', 1), (u'matter', 1), (u'silly', 1), (u'were', 1), (u'cards', 1), (u'God', 1), (u'turned', 1), (u'speakers', 1), (u'mil', 1), (u'mid', 1), (u'play', 1), (u'sure', 1), (u'who', 1), (u"time's", 1), (u'glow', 1), (u'flow', 1), (u'drive', 1), (u'queen', 1), (u"mama's", 1), (u'bring', 1), (u'slow', 1), (u'black', 1), (u'him', 1), (u'morning', 1), (u'she', 1), (u'shots', 1), (u'dough', 1), (u'gangster', 1), (u'hoes', 1), (u'best', 1), (u'pump', 1), (u"there's", 1), (u'tone', 1), (u'three', 1), (u'secret', 1), (u'dame', 1), (u'life', 1), (u'shopping', 1), (u'ani', 1), (u'haters', 1), (u'these', 1), (u'n', 1), (u'pound', 1), (u'almost', 1), (u'different', 1), (u'used', 1), (u'hang', 1), (u'Y', 1), (u'spree', 1), (u'off', 1), (u'thought', 1), (u'yours', 1), (u'yes', 1), (u'lost', 1), (u'gimme', 1), (u'clap', 1), (u'old', 1), (u'welcome', 1), (u'burn', 1), (u'O', 1), (u'anything', 1), (u'female', 1), (u'there', 1), (u'Kitty', 1), (u'happy', 1), (u'girlfriend', 1), (u'shit', 1), (u'More', 1), (u'Asian', 1), (u'felt', 1), (u'town', 1), (u'N', 1), (u'faster', 1), (u'together', 1), (u'Scott', 1)]
kbutler19@DetectiveSkye:~/TextMining$

133 changes: 133 additions & 0 deletions kpop.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
"""
Finds English words sampled in the lyrics of the kpop girl group 2NE1 and
analyzes their sentiment and frequency

@author: Katie Butler
"""

import urllib#.request
#import urllib.error
#import urllib.parse
#import bs4
#from bs4
import BeautifulSoup as bs
from pattern.web import *

#g = open('/usr/share/dict/korean')
#M = g.readlines(L)
#for i in range(len(M)):
# M[i] = M[i].strip()
#g.close()
L = []
f = open('/usr/share/dict/american-english')
L = f.readlines()
#english = {}
#for i in L)):9
# L[i] = L[i].strip()
def real_word(word):
"""This is taking """
word = word.strip()
return word
english = map(real_word,L)

f.close()

# artists = 'http://www.azlyrics.com/19/2ne1.html'
# g = urllib2.urlopen(artists)
# html = g.read()
# #print html
# links = bs.BeautifulSoup(html)
# print links

def processing():
# song_links = links.find('div',id='listAlbum')
# #print song_links
# songs = song_links.findAll('a',href=True)
#i = 0
# for url in songs:
# if 'http://www.amazon.com' not in url['href']:
# song_url = 'http://www.azlyrics.com' + url['href'][2:]
# song = urllib2.urlopen(song_url)
# read_lyrics = song.read()
# f = open('song'+str(i)+'.txt','w')
# f.write(read_lyrics)
# f.close()
# i+=1
num_songs = 1#42
for j in range(num_songs):
#bs = BeautifulSoup.BeautifulSoup.getText(read_lyrics)
f = open('song'+str(j)+'.txt')
read_lyrics = f.read()
bs_lyrics = bs.BeautifulSoup(read_lyrics)
#print bs_lyrics
#break
divs = bs_lyrics.findAll('div')
lyrics = ''
for d in divs:
#print '------------------------------------------------------------'
#print d
if len(d) > len(lyrics):
text = d
text = bs_lyrics.find("form",id="addsong")
print text
print text.find_previous_siblings()
break

first_link = soup.a
first_link
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

first_link.find_all_previous("p")
# [<p class="story">Once upon a time there were three little sisters; ...</p>,
# <p class="title"><b>The Dormouse's story</b></p>]

first_link.find_previous("title")
# <title>The Dormouse's story</title>



divs = bs_lyrics.findAll('div')
lyrics = ''
for d in divs:
#print '------------------------------------------------------------'
#print d
if len(d) > len(lyrics):
text = d

#print lyrics
#break
#lyrics = text.getText()
#print lyrics
#lyrics = [x.getText() for x in lyrics]
"""f = open('kpop.pickle','w')
pickle.dump(lyrics,f)
f.close()

# Load data from a file (will be part of your data processing script)
input_file = open('kpop.pickle','r')
bad_taste = pickle.load(input_file)
input_file.close()
print kpop"""

processing()

def sample(kpop,english):
for i in kpop:
if kpop[i] not in english:
kpop.remove[i]
print kpop

#sample(kpop,english)

"""def printing(artist, title, save, lyrics):
for x in lyrics:
print(x, end="\n\n")
if save == True:
saving(artist, title, lyrics)
elif save == False:
pass

def saving(artist, title, lyrics):
f = open(artist + '_' + title + '.txt', 'w')
f.write("\n".join(lyrics).strip())
f.close()"""
121 changes: 121 additions & 0 deletions kpop_lyrics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
"""
Finds English words sampled in the lyrics of the kpop girl group 2NE1 and
analyzes their sentiment and frequency

@author: Katie Butler
"""

import urllib#.request
#import urllib.error
#import urllib.parse
#import bs4
#from bs4
import BeautifulSoup as bs
from pattern.web import *

#g = open('/usr/share/dict/korean')
#M = g.readlines(L)
#for i in range(len(M)):
# M[i] = M[i].strip()
#g.close()
L = []
f = open('/usr/share/dict/american-english')
L = f.read()
#english = {}
#for i in L)):9
# L[i] = L[i].strip()
def real_word(word):
"""This is taking """
word = word.strip()
return word
english = L.split()

all_english = {}
frequency = {}


f.close()

# artists = 'http://www.azlyrics.com/19/2ne1.html'
# g = urllib2.urlopen(artists)
# html = g.read()
# #print html
# links = bs.BeautifulSoup(html)
# print links

def processing(numb):
# song_links = links.find('div',id='listAlbum')
# #print song_links
# songs = song_links.findAll('a',href=True)
#i = 0
# for url in songs:
# if 'http://www.amazon.com' not in url['href']:
# song_url = 'http://www.azlyrics.com' + url['href'][2:]
# song = urllib2.urlopen(song_url)
# read_lyrics = song.read()
# f = open('song'+str(i)+'.txt','w')
# f.write(read_lyrics)
# f.close()
# i+=1
all_words = []
bs_lyrics = []
if numb > 41:
return 'exceeding index length'
elif numb == 0:
f = open('lyrics'+str(0)+'.txt')
read_lyrics = f.read()
#print read_lyrics
bs_lyrics = bs.BeautifulSoup(read_lyrics).prettify()
lyrics = bs.BeautifulSoup(bs_lyrics).getText()
words = lyrics.split()
#kpop = words.splitlines()
#print bs_lyrics
#lyrics = bs_lyrics.getText()
#print lyrics
#lyrics = [bs_lyrics for bs_lyrics in bs_lyrics.stripped_strings]
#[text for text in soup.stripped_strings]
all_words += words
else:
num_songs = numb
for j in range(num_songs+1):
#bs = BeautifulSoup.BeautifulSoup.getText(read_lyrics)
if num_songs == 15 or num_songs == 40:
return 'This song is entirely in English and will not be counted'
else:
f = open('lyrics'+str(j)+'.txt')
read_lyrics = f.read()
#print read_lyrics
#bs_lyrics = read_lyrics.getText()
bs_lyrics = bs.BeautifulSoup(read_lyrics).prettify()
lyrics = bs.BeautifulSoup(bs_lyrics).getText()
#print bs_lyrics
words = lyrics.split()
#kpop = words.splitlines()
#lyrics = bs_lyrics.getText()
#print lyrics
#lyrics = [bs_lyrics for bs_lyrics in bs_lyrics.stripped_strings]
#[text for text in soup.stripped_strings]
all_words += words
for i in range(len(all_words)):
try:
all_words[i] = all_words[i].decode()
except UnicodeEncodeError:
all_words[i] = ''
return all_words
kpop = processing(20)


def sample(korean):
for word in korean:
if word in english:
all_english[word] = all_english.get(word,0)+1

def histogram(korean):
for word in korean:
frequency[word] = frequency.get(word,0)+1

sample(kpop)
histogram(kpop)

print sorted(all_english.items(),lambda x,y:x[1]-y[1],reverse = True)
#print frequency
89 changes: 89 additions & 0 deletions lyrics0.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
<div>
<!-- Usage of azlyrics.com content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that. -->
<i>[CL:]</i><br>
I go by the name of CL of 2NE1<br>
It's been a long time coming, but We're here now<br>
And We about to set the roof on fire baby (Uh oh)<br>
You better get yours cause I'm Gettin' mine<br>
<br>
<i>[Dara:]</i><br>
Eh eh eh eh eh eh eh eh 2NE1<br>
Eh eh eh eh eh eh eh eh You gotta ring the alarm<br>
Eh eh eh eh eh eh eh eh We 2NE1<br>
Eh eh eh eh eh eh eh eh Hey Hey Hey Hey Hey<br>
<br>
<i>[CL:]</i><br>
Come in Come in Come in dareun saesangeuro<br>
Jikyeobkiman han komineun ijae deungeul jigo<br>
La La La La gashik eobneun naui kkotnoraero<br>
Ha Ha Ha Ha dashineol bichi mothaedoruk<br>
<br>
<i>[Minzy:]</i><br>
Now Let's chumeul chumeul chumeul chouyo Wanna get down<br>
Boda kkeun kkumeul kkumeul kkumeul gwo saesangeun naemam<br>
Daero da hal su ittgiyae kkeum jayureul euihae Tonight Tonight Oh~<br>
<br>
<i>[Bom:]</i><br>
Nae nunbicheun bitnaneun byeol deulro<br>
Na shimjangsogeun tae u neun jeo beulbitdo<br>
Yeongweonhajin anhkyeatji deo irheul keon eobsji<br>
Oh Oh Oh Oh Oh Oh Oh OH Yeah<br>
<br>
<i>[CL:]</i><br>
Na mi mi mi mi mi mi michigo shippeo<br>
Deo pali tik tik tik tik tik tik tik tikgo shippeo<br>
Jeo nopeun bildingeuro jeo pureun haneuleoro<br>
Keukae so ri ri ri ri ri ri richigu shippeo<br>
<br>
<i>[Dara:]</i><br>
You got the fire naui gaseumeun kkung kkung kkung<br>
You gotta drop it like it's hot jigeum meomchuryeo hajima Ooh<br>
The Fire nae meoriseukeun kkung kung kkung<br>
I gotta drop it like it's hot meomchuryeo hajima Hey<br>
<br>
<i>[CL:]</i><br>
Get up Get up Get up Get up myeotbeol neomeojyeodo<br>
Mideotdeon saesangi nal teudashi baeshinhaedo<br>
Na na na nan jeoldae eulji anha babo cheoreom<br>
Eo meo meo meo naesungtalji mal a nam deul cheoreom<br>
<br>
<i>[Minzy:]</i><br>
Naega jeo kkeulkkaji daeryeokalkae Follo-Follow me<br>
Sumi cha oreulmankeum dalryeojineun naui gaseumi<br>
Eonji na shiljimaneun alnha jaemitchyeo?<br>
Keomnaji mal a Let it go<br>
Boda deo naeun nae ilro Le Le Le Le Le Let's go<br>
<br>
<i>[Bom:]</i><br>
Nae nunbicheun bitnaneun byeol deulro<br>
Na shimjangsogeun tae u neun jeo beulbitdo<br>
Yeongweonhajin anhkyeatji deo irheul keon eobsji<br>
Oh Oh Oh Oh Oh Oh Oh OH Yeah<br>
<br>
<i>[CL:]</i><br>
Na mi mi mi mi mi mi mi michigo shippeo<br>
Deo pali tik tik tik tik tik tik tik tikgo shippeo<br>
Jeo nopeun bildingeuro jeo pureun haneuleoro<br>
Keukae so ri ri ri ri ri ri richigu shippeo<br>
<br>
<i>[Dara:]</i><br>
Sori jilleo Eh eh eh eh eh eh eh eh 2NE1<br>
Eh eh eh eh eh eh eh eh You gotta ring the alarm<br>
Eh eh eh eh eh eh eh eh We 2NE1<br>
Eh eh eh eh eh eh eh eh<br>
<br>
<i>[Dara:]</i><br>
Meoriga chalrang chalrang chalrang chalrang daedoreuk<br>
Eongdeongil salrang salrang salrang salrang heundeulo<br>
Meoriga chalrang chalrang chalrang chalrang daedoreuk<br>
Eongdeongil salrang salrang salrang salrang heundeulo Uh<br>
<br>
<i>[CL:]</i><br>
Na mi mi mi mi mi mi michigo shippeo<br>
Deo pali tik tik tik tik tik tik tik tikgo shippeo<br>
Jeo nopeun bildingeuro jeo pureun haneuleoro<br>
Keukae so ri ri ri ri ri ri richigu shippeo<br>
Eonjena oneulcheoreom nan jayoreubgo shippeo<br>
<br>

</div>
Loading