Skip to content

caveman210/scrape-ktu-java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Idea


This has been left abandoned due to reCaptcha issues arising in the website; and now a rewrite in Python seems the better approach. Favouring the stealth library scenarios.

Well, anyway, it was fun.


Okay, I have an idea.
Its a website I want to scrape from, but it has this, say, many buttons underneath many buttons kind of structure, with documents all attached to them as blobs (pretty sure it was a React job but who cares). So I'd like to implement a git system, which checks for the files and the folders and updates accordingly, comparing the hashes. If a new folder or section is created, a new folder is made in my laptop and accordingly all the files underneath are downloaded. (Every section: new folder; every button: new folder; every document: new file.)

com.scrapektu.app
|
| - App.java              (End integrator)
|
| - model/
| | -Node.java           (Defines a class of tree nodes with attributes)
| | -NodeType.java       (List of enums to do the job)
|
| - session/
| | -BrowserSession.java
|
| - scraper/
| | - TreeBuilder.java    (Accesses the JSON/YAML and then makes a tree from the layout)
|
| - sync/   (Stage 2)
| | - SyncEngine.java
| | - RepoManager.java
| | - FileHasher.java
|
| - net/
| | - Downloader.java

Session.java       → Starts browser     (no scraping)
Scrape.java        → Navigates website  (all scraping logic)
Node.java          → Holds scraped data (tree structure)
JsonUtil.java      → Save/load tree
SyncEngine.java    → Compare trees + download
RepoManager.java   → Write folders/files

Pull from:

  1. [https://ktu.edu.in/academics/notification]
  2. [https://ktu.edu.in/academics/mooccources]
  3. [https://ktu.edu.in/academics/scheme]
  4. [https://ktu.edu.in/academics/academic_calendar]

About

(Abandoned.) Webscraper built in Selenium-Java

Topics

Resources

License

Stars

Watchers

Forks

Languages