This repository describes the BISON framework (Blockchain Interpretable Success prediction for SOcial media NFTs), which leverages linguistic statistics and blockchain-derived features to model and explain the success of blockchain-native articles (e.g., writing NFTs).
Below are the features used in the model, categorized by type. These features are used in a "multimodal", explainable ML pipeline to predict and interpret success on decentralized content platforms.
These features are extracted from the textual content of articles:
Kincaid Grade Level: U.S. grade level required to understand the text.Flesch Reading Ease: Score from 0–100; higher means easier.Gunning Fog Index: Years of formal education required.
characters_per_word: Average characters per wordsyll_per_word: Average syllables per wordwords_per_sentence: Average words per sentencesentences_per_paragraph: Average sentences per paragraphtype_token_ratio: Lexical diversity (% of unique word types over total tokens)characters: Total character countsyllables: Total syllable countwords: Number of content tokenswordtypes: Number of unique content word typessentences: Total sentence countparagraphs: Total paragraph countlong_words: Words with more than 6 letterscomplex_words: Polysyllabic and uncommon words
cleaned_text: Text after removing noise and irrelevant characterslanguage: Detected language of the textcleaned_body: Cleaned version of the article body textcleaned_title: Cleaned article titleprocessed_cleaned_text: Text after further processing like normalizationcleaned_text_tokenized: Tokenized version of the cleaned textcleaned_text_lemmatized: Lemmatized tokens (base forms of words)cleaned_text_POS: Part-of-speech tagging of tokenscleaned_text_sentiment: Sentiment score derived from textwords_body: Word count in the article bodywords_title: Word count in the titlewords_text: Total word count combining title and bodynormalized_tfidf_sum: Normalized sum of TF-IDF scores across documentverbs_density: Density of verbs in textadjectives_density: Density of adjectives in textnouns_density: Density of nouns in text
Thematic topics extracted from articles:
topic: (categorical) Represents the main topic of the article. Possible values are: T1: Gaming, Virtual Worlds & Characters; T2: Wallets, Airdrops & Ethereum Tools; T3: Web3, Blockchain & Digital Platforms; T4: DeFi, Market Strategies & Liquidity; T5: Blockchain, Transactions & Smart Contracts; T6: Web3 Launches, Rewards & Creators; T7: Human Thoughts, Emotions & Reflections.topic_T1: Gaming, Virtual Worlds & Characterstopic_T2: Wallets, Airdrops & Ethereum Toolstopic_T3: Web3, Blockchain & Digital Platformstopic_T4: DeFi, Market Strategies & Liquiditytopic_T5: Blockchain, Transactions & Smart Contractstopic_T6: Web3 Launches, Rewards & Creatorstopic_T7: Human Thoughts, Emotions & Reflections
For each keyword (nft, web3, community, blockchain, crypto, wallet, chain):
<keyword>: indicates the presence (1) or absence (0) of the keyword in the article text.
days_since_epoch: Days elapsed since article publicationpublication_date: Full publication date of the article or NFT inYYYY-MM-DDformat.year_month: Publication date grouped by year and month inYYYY-MMformat, useful for temporal aggregation.year: Year of publicationmonth: Month of publication (values from 1 to 12)day: Day of publicationweekday: Weekday of publication, encoded as 0=Monday, 1=Tuesday, 2=Wednesday, 3=Thursday, 4=Friday, 5=Saturday, 6=Sunday
These features capture blockchain and crypto ecosystem signals relevant to each article:
For each token (BTC, TETHER, OPTIMISM, ETH, USDC, DAI) at the publication date:
open_<token>_usd: Opening pricelast_<token>_usd: Closing pricemax_<token>_usd: Daily maximummin_<token>_usd: Daily minimumvol_<token>: Trading volumevar%_<token>: Daily % price change
daily_transactions_optimism: Daily transaction count on Optimism networketh_active_addresses_total: Total active Ethereum addresseseth_active_addresses_sender: Active sending addresses on Ethereumeth_active_addresses_receiver: Active receiving addresses on Ethereumoptimism_active_addresses_total: Total active addresses on Optimism network
author_address: Wallet address of the authorauthor_ether_balance: ETH balance of author's walletauthor_transactions_number: Total blockchain transactions by authorauthorPostCount: Number of published articles by authorauthorTotalSales: Number of Writing NFTs sold by authorauthorTotalRevenue: Total ETH revenue from NFT sales by authorAuthor Homepage: URL of the author's homepage or profile
writing_nft: Identifier indicating the article is minted as a writing NFTTotal Sold(ETH): Total ETH earned from all sales of the NFTTotal Sold Numbers: Total quantity of NFTs soldTotal Buyers: Number of unique buyersPrice(ETH): Listing or sale price of the NFTnft_address: Blockchain address of the NFT contractcollection: Name of the NFT collectionfees: Associated fees (e.g., royalties) on NFT salescreated_date: Date of NFT or article creationlink: URL to the article or NFT pagedigest: Unique content hash or digesttransaction_id: Blockchain transaction ID for mint or salebody: Raw text body of the articletimestamp: Timestamp of article or NFT eventtitle: Raw article title
week_google_searches_nft: Google Trends score for "nft" in publication weekweek_google_searches_crypto: Google Trends score for "crypto"week_google_searches_bitcoin: Google Trends score for "bitcoin"week_google_searches_ethereum: Google Trends score for "ethereum"week_google_searches_optimism: Google Trends score for "optimism"
Success: Numeric indicator of article successSuccessBinary: Binary success label (success/failure)