Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions data/scrape-share-food-program/.env_template
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
SUPABASE_URL=
SUPABASE_API_KEY=
47 changes: 18 additions & 29 deletions data/scrape-share-food-program/README.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,35 @@
# Share Food Program Scraping
# Share Food Program Sync

The Share Food Program can be found here: https://www.sharefoodprogram.org/

The site contains regularly-updated information about food resources in the Philadelphia area. This directory contains Python code for scraping this site.
Scrapes approved food distribution sites from the [Share Food Program](https://www.sharefoodprogram.org/) map API and upserts them into the Supabase `resources` table. All records written by this script use `creator = "phlask-share-food-program-sync"` — each run deletes those records then re-inserts fresh ones.

## Setup

### Install Python

First, make sure to have Python 3.12+ installed. We also recommend using [PyCharm](https://www.jetbrains.com/pycharm/download) for Python development.

### Create a Virtual Environment and Install Dependencies

Inside of this directory, run the following commands:

```bash
python -m venv .venv
# If on Mac/Linux
source .venv/bin/activate
# If on Windows
.venv\Scripts\activate
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
```

### Add Firebase Credentials
Create a `.env` file in this directory:

To run the scraper and upload the data to Firebase, you will need to add your Firebase credentials to this folder. Message us in the #phlask_data channel on Slack to get access.
```env
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_API_KEY=your-service-role-key
```

### Run the Scraper
Message us in `#phlask_data` on Slack to get the credentials.

To run the scraper, use the following command, making sure to set the URL below to the correct URL for your Firebase instance.
## Usage

**Sync to Supabase:**
```bash
python scrape_share_food_program.py https://phlask-share-food-test.firebaseio.com/
python scrape_share_food_program.py
```

You should see output like the following:

**Debug locally (no Supabase required):**
```bash
python scrape_share_food_program.py --csv # writes resources.csv
python scrape_share_food_program.py --csv out.csv # custom filename
```
Got 169 new resources from the scraped resource
Using DB URL: https://phlask-share-food-test.firebaseio.com/
Loaded PHLASK DB reference with 819 resources
Removed 169 existing scraped resources from the DB
We now have 819 total resources in the DB
```

The CSV serializes JSONB fields (`source`, `verification`, `food`) as JSON strings so the output is inspectable without a database connection.
81 changes: 2 additions & 79 deletions data/scrape-share-food-program/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,81 +1,4 @@
asttokens==3.0.0
attrs==24.3.0
backcall==0.2.0
beautifulsoup4==4.12.3
bleach==6.2.0
CacheControl==0.14.1
cachetools==5.5.0
certifi==2024.8.30
cffi==1.17.1
charset-normalizer==3.4.0
colorama==0.4.6
cryptography==46.0.6
decorator==5.1.1
defusedxml==0.7.1
docopt==0.6.2
executing==2.1.0
fastjsonschema==2.21.1
firebase-admin==6.6.0
google-api-core==2.23.0
google-api-python-client==2.154.0
google-auth==2.36.0
google-auth-httplib2==0.2.0
google-cloud-core==2.4.1
google-cloud-firestore==2.19.0
google-cloud-storage==2.18.2
google-crc32c==1.6.0
google-resumable-media==2.7.2
googleapis-common-protos==1.66.0
grpcio==1.68.0
grpcio-status==1.68.0
httplib2==0.22.0
idna==3.10
ipython==8.12.3
jedi==0.19.2
Jinja2==3.1.6
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jupyter_client==8.6.3
jupyter_core==5.7.2
jupyterlab_pygments==0.3.0
MarkupSafe==3.0.2
matplotlib-inline==0.1.7
mistune==3.1.0
msgpack==1.1.0
nbclient==0.10.2
nbconvert==7.17.0
nbformat==5.10.4
packaging==24.2
pandocfilters==1.5.1
parso==0.8.4
pickleshare==0.7.5
pipreqs==0.5.0
platformdirs==4.3.6
prompt_toolkit==3.0.48
proto-plus==1.25.0
protobuf==6.33.5
pure_eval==0.2.3
pyasn1==0.6.3
pyasn1_modules==0.4.1
pycparser==2.22
Pygments==2.20.0
PyJWT==2.12.0
pyparsing==3.2.0
python-dateutil==2.9.0.post0
pywin32==308
pyzmq==26.2.0
referencing==0.35.1
python-dotenv>=1.0.0
requests==2.33.0
rpds-py==0.22.3
rsa==4.9
six==1.17.0
soupsieve==2.6
stack-data==0.6.3
tinycss2==1.4.0
tornado==6.5.5
traitlets==5.14.3
uritemplate==4.1.1
urllib3==2.6.3
wcwidth==0.2.13
webencodings==0.5.1
yarg==0.1.9
supabase>=2.3.0
Loading
Loading