Skip to content

CodeforKarlsruhe/auenBot

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AuenBot

Migration from LUBW/NAZKA ChatBot KarlA

Status

Analyzed dataset from LUBW

Extracted chatbot signatures for intents and responses

Extracted data from plants, animals and Rheinauen" area

Intents also relate to access to current whether conditions and environmental data.

Initial decoding and routing implemented in python. Input matching with rapidfuzz (text matchin library). Fallback to remote LLM if required. Options for vector-search but initial version with BM25 not very helpfull. Future test with bge-m3 pending (vectors from intent samples already generated).

RawData

Primary files

  • tiere_pflanzen_auen.json: knowledge base. dataset for animals, plants and some Rheinauen types.
  • intents.json: intents with sample texts and utterance (if any)
  • intent_vectors.json: vectorized (bge-m3 embeddings) text samples and corresponding intent_id. Needs git lfs

Original bot

  • tagsAndInstructions.json: additional info for original bot decoding and routing

Auxiliary, input or leftover files

  • pflanzenKeys.json: Parameters for plant descriptions
  • tiereKeys.json: Parameters for animal descriptions
  • taskList.json: decoded signatures. if utter is present, it should be used as response. Otherwise, intent should either start with tp_, tiere_, pflanzen_ which should then address the data from the corresponding types (or both), or with wetter or messdaten. Reference to the few Rheinauen datasets has to be defined still.

Media

Directory of fantasy images

512*512, generated by flux-1-schnell.

Next Steps

Basic Bot

Create vector embedding for all intent texts. Setup database with vectors, full text and intent names. Test chatbot response to arbitrary requests.

Reference Data

Add data access to whether conditions, environmental data, Wikidata images and audio files, Source to be found, probably from NAZKA, or https://www.museumfuernaturkunde.berlin/de/forschung/tierstimmenarchiv. MP3 files were missing in input dataset.

About

Migration from LUBW/NAZKA ChatBot KarlA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%