This is a complete google drive search engine API based on Flask using the Tf-Idf algorithm. The program is capable of searching through word, pdf, images (via OCR), ppt and many more file types.
- Clone the repo, install the requirements.txt.
- Then get your client_id.json from the google dashboard and use redirect URI as
http://localhost:5000/oauth2callback. Place the client_id.json in the auth folder, replacing the already existing one. - Run the app by running command
python app.js, visit URLhttp://localhost:5000/authenticateto authenticate from your google drive. - Visit URL
http://localhost:5000/download_filesto download all files to your pc from the Gdrive. - To, extract the text from ppt, jpg, png, word, txt, pdf files visit the URL
http://localhost:5000/extract_text, and to run Tf-IDf across the extracted text visitindex_files. Now, your files are searchable. You can search for data within your files by hitting urlhttp://localhost:5000/search?query=testSearchwith the get parameter ofquery.