-
Notifications
You must be signed in to change notification settings - Fork 16
automate spam detection #772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
automate spam detection #772
Conversation
b46ba40 to
20c2574
Compare
20c2574 to
5b1ce46
Compare
|
we would need to add the jetstream instance ip to ALLOWED_HOSTS |
|
just missing the migration for I'll read through it again but it seemed all good on a first pass, besides having a way to kick off the process |
|
Regarding the way to start the process from CoMSES side:
something like this? |
84f051c to
0cbae60
Compare
|
@asuworks I just remembered there was some additional cleanup I wanted to do eventually with the spam stuff. This might be a good place to get that done if you are up for it. comses/planning#249. Namely the second point (refactoring the serializer mixin to actually be just a mixin) |
bde15a4 to
38877e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this (everything besides Command) be moved to a module in curator/? Maybe spam.py
06fbb25 to
2c99ecf
Compare
- api/spam/get-latest-batch/ returns the latest set of content to be checked for spam - api/spam/update updates the status of the content - a SpamModeration record with status `SCHEDULED_FOR_CHECK` is stored on every Job, Event, Codebase submission. A decoupled external service will query for these objects to check them for spam.
- fix tests - add asdf & direnv to .gitignore and .dockerignore
- unshelve JetStream2 instance, triggers the LLM spam check workflow, then shelves the instance again when the workflow is done.
…management command + minor refactoring
…ement command + minor refactoring
…pamModeration object is automatically created for the associated MemberProfile
…ed_for_check, spam_likely, not_spam_likely
6fe5adc to
cac9496
Compare
|
added "one-click" install of the The script does the following:
After the ansible playbook is done, the management commands from CoMSES should be able to trigger |
This PR attempts to automate the spam detection process for
Job,Event,CodebaseandMemberProfileobjects using an external LLM service.LLM Spam Detection Process
SCHEDULED_FOR_CHECKis stored on everyJob,Event,Codebase, submission andUser(SpamModerationobject is attached to the associatedMemberProfile) creation.SpamModerationobjects (api/spam/get-latest-batch/), analyzes them for spam and submits a spam report toapi/spam/updatefor each one of them.api/spam/updateon the CoMSES side updates the correspondingSpamModerationobject according to the LLM report from the external service.Starting the LLM Spam Detection Process
The external service asuworks/comses.spamcheck is deployed on an existing JetStream2 instance which is unshelved before the spam check workflow is triggered and shelved automatically after it is done by the following management command:
Environment & Secrets
Following environment variables must be set:
JetStream2 Credentials
can be found here: https://js2.jetstream-cloud.org/identity/application_credentials/
secrets/llm_spam_check_jetstream_os_application_credential_secretsecrets/llm_spam_check_jetstream_os_application_credential_idX-API-Key header for the API
Access to
api/spam/updateandapi/spam/get-latest-batchroutes is protected by theX-API-Keyheader verification.The key should be set in
secrets/llm_spam_check_api_keyALLOWED_HOSTSThe IP of the JetStream2 instance must be added to Django's
ALLOWED_HOSTS