Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,23 @@ nav_order: 5

## QuickStart

Before using Gleaner, not that Gleaner does have one prerequisit in the form of an accessible S3 compliant
Before using Gleaner, note that Gleaner does have one prerequisite in the form of an accessible S3-compliant
object store. This can be AWS S3, Google Cloud Storage or others. Also, there is the open source
and free Minio object store which is used in many of the examples in GleanerIO.

Once you have the object store running and ready you are ready to run Gleaner.
Pull down the release that matches your ssysem from [version 3.0.4](https://github.com/gleanerio/gleaner/releases/tag/v3.0.4-dev).
Below is an example of pulling this down for a Linux system on ADM64 architecture.
Once you have the object store running and ready, you are ready to run Gleaner.
Pull down the release that matches your system from [version 3.0.4](https://github.com/gleanerio/gleaner/releases/tag/v3.0.4-dev).
Below is an example of pulling this down for a Linux system on AMD64 architecture.

```
wget https://github.com/gleanerio/gleaner/releases/download/v3.0.4-dev/gleaner-v3.0.4-dev-linux-amd64.tar.gz
```

You will need a configuration file and an example such file can be found in the resources directory. See also
You will need a configuration file and an example of one can be found in the resources directory. See also
the config file in the Gleaner Config page.

You can set the values in this configuration file. However, you can leave the Minio value empty and pass
then via environment variables. This sort of approach can work better in some orchestration environments or just
them via environment variables. This sort of approach can work better in some orchestration environments or just
be a safer approach to managing these keys.

```
Expand All @@ -36,7 +36,7 @@ export MINIO_SECRET_KEY=SECRETVALUE
export MINIO_BUCKET=mybucket
```

With those set and your configuration file in palce you can run Gleaner with
With those set and your configuration file in place you can run Gleaner with


```
Expand Down
4 changes: 2 additions & 2 deletions config.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ nav_order: 3

## Source

Sources can be defined as two type. A sitemap, which is a traditional sitemap that
points to resources or a sitemap index that points to a set of sitemaps.
Sources can be defined as two types. A sitemap, which is a traditional sitemap that
points to resources, or a sitemap index that points to a set of sitemaps.

The other is a sitegraph, which is a pre-computed graph for a site.

Expand Down
19 changes: 9 additions & 10 deletions dockercli.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,12 @@ total 1356
-rwxr-xr-x 1 fils fils 1852 Aug 15 14:06 gleanerDocker.sh
```

Let's see if we can setup our support infrastructure for Gleaner. The
file gleaner-IS.yml is a docker compose file that will set up the object store,
Let's see if we can set up our support infrastructure for Gleaner. The
file gleaner-IS.yml is a docker compose file that will set up the object store
and a triplestore.

To do this we need to set up a few environment variables. To do this we can
leverage the setenvIS.sh script. This script will set up the environment we need.
Note you can also use a .env file or other approaches. You can references
To do this, we need to set up a few environment variables by leveraging the setenvIS.sh script. This script will set up the environment we need.
Note you can also use a .env file or other approaches. You can reference
the [Environment variables in Compose](https://docs.docker.com/compose/environment-variables/) documentation.

```bash
Expand Down Expand Up @@ -86,7 +85,7 @@ working config file was downloaded.
> Note: This config file will change... it's pointing to an OIH partner
> and I will not do that for the release. I have a demo site I will use.

Next we need to setup our object for Gleaner. Gleaner itself can do this
Next we need to set up our object for Gleaner. Gleaner itself can do this
task so we will use

```bash
Expand Down Expand Up @@ -140,7 +139,7 @@ millers.go:81: Miller run time: 0.024649
## Working with results

If all has gone well, at this point you have downloaded the JSON-LD documents into Minio or
some other object store.Next we will install a client that we can use to work with these objects.
some other object store. Next we will install a client that we can use to work with these objects.

Note, there is a web interface exposed on the port mapped in the Docker compose file.
In the case of these demo that is 9000. You can access it at
Expand All @@ -164,7 +163,7 @@ There is also a [Minio Client Docker image](https://hub.docker.com/r/minio/minio
that you can use as well but it will be more difficult to use with the following scripts due
to container isolation.

To man an entry in the mc config use:
To make an entry in the mc config, use:

```
mc alias set oih http://localhost:9000 worldsbestaccesskey worldsbestsecretkey
Expand All @@ -190,7 +189,7 @@ You can explore mc and see how to copy and work with the object store.
As part of our Docker compose file we also spun up a triplestore. Let's use that now.


Now Download the minio2blaze.sh script.
Now download the minio2blaze.sh script.

```bash
curl -O https://raw.githubusercontent.com/earthcubearchitecture-project418/gleaner/master/scripts/minio2blaze.sh
Expand Down Expand Up @@ -239,7 +238,7 @@ where
LIMIT 10
```

A very simple SPARQL to give us the first 10 results from the triplestore. If all has gone well,
A very simple SPARQL query to give us the first 10 results from the triplestore. If all has gone well,
we should see something like:

![Blazegrah](./assets/images/simplequery.png)
Expand Down
10 changes: 5 additions & 5 deletions faircontext.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ To provide better context we can define three personas to better express the rol

### Persona: Publisher

The Publisher is engaged authoring the JSON-LD documents and publishing them
The Publisher is engaged with authoring the JSON-LD documents and publishing them
to the web. This persona is focused on describing and presenting structured data on the web
to aid in the discovery and use the resources they manage.
Details on this persona can be found in the [Publisher](../publishing/publishing.md) section.
Expand Down Expand Up @@ -52,8 +52,8 @@ user experiences are described in the [User](../users/referenceclient.md) sectio

## FAIR Implementation Network

We can think of the above personnas and how they might be represented in a FAIR
implementation network. The diagram that follow represents some of these relations.
We can think of the above personas and how they might be represented in a FAIR
implementation network. The diagram that follows represents some of these relations.

![relations](assets/images/relations.png)

Expand Down Expand Up @@ -128,7 +128,7 @@ the Go-FAIR [FAIR Principles](https://www.go-fair.org/fair-principles/) page.
| Principles | Project |
| ------------------- | ------------------------------------------------------------------------ |
| License | schema:license or related (again, here we can leverage SHACL validation) |
| Community standards | Ocean InfoHub, POLDER, CCADI, GeoCODEs, Internet of Water |
| Community standards | Ocean InfoHub, POLDER, CCADI, GeoCODES, Internet of Water |

## Users

Expand All @@ -149,7 +149,7 @@ GeoCODES is an NSF Earthcube program effort to better enable cross-domain discov

[https://oceaninfohub.org/](https://oceaninfohub.org/)

The Ocean InfoHub (OIH) Project aims to improve access to global oceans information, data and knowledge products for management and sustainable development.The OIH will link and anchor a network of regional and thematic nodes that will improve online access to and synthesis of existing global, regional and national data, information and knowledge resources, including existing clearinghouse mechanisms. The project will not be establishing a new database, but will be supporting discovery and interoperability of existing information systems.The OIH Project is a three-year project funded by the Government of Flanders, Kingdom of Belgium, and implemented by the IODE Project Office of the IOC/UNESCO.
The Ocean InfoHub (OIH) Project aims to improve access to global oceans information, data and knowledge products for management and sustainable development. The OIH will link and anchor a network of regional and thematic nodes that will improve online access to and synthesis of existing global, regional and national data, information and knowledge resources, including existing clearinghouse mechanisms. The project will not be establishing a new database, but will be supporting discovery and interoperability of existing information systems. The OIH Project is a three-year project funded by the Government of Flanders, Kingdom of Belgium, and implemented by the IODE Project Office of the IOC/UNESCO.

* [OIH Book](https://book.oceaninfohub.org)
* [Example Validation](https://github.com/gleanerio/notebooks/blob/master/notebooks/validation/output/report_07-18-2022-15-11-18.pdf)
Expand Down
8 changes: 4 additions & 4 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,15 @@ Gleaner is a tool for extracting JSON-LD from web pages. You provide Gleaner a l

## Open Foundation

Communities of practice can leverage open schema (schema.org) along with web architecture approaches to build domain search portals. Enhance and extend with community vocabularies to address specific domain needs. This foundation is also leveraged by Google Data Set Search and is complementary to that service. Web architecture as foundation allows a community to provide a more detailed community experiences, while still leveraging the global reach of commercial search indexes.
Communities of practice can leverage open schema (schema.org) along with web architecture approaches to build domain search portals. Enhance and extend with community vocabularies to address specific domain needs. This foundation is also leveraged by Google Data Set Search and is complementary to that service. Web architecture as a foundation allows a community to provide more detailed community experiences, while still leveraging the global reach of commercial search indexes.

## Big Picture

Gleaner is part of the larger GleanerIO approach. GleanerIO includes approaches for leveraging spatial, semantic, full text or other index approaches. Additionally there is guidance on running Gleaner as part of a routinely updated index of resources and a reference interface for searching the resulting graph. GleanerIO provides a full stack approach to go from indexing to a basic user interface searching a generated Knowledge Graph, an example index. The whole GleanerIO stack can be run on a laptop (it uses Docker Compose files) or deployed to the cloud. Cloud environments used include AWS, Google Cloud, and OpenStack.
Gleaner is part of the larger GleanerIO approach. GleanerIO includes approaches for leveraging spatial, semantic, full text or other index approaches. Additionally, there is guidance on running Gleaner as part of a routinely updated index of resources and a reference interface for searching the resulting graph. GleanerIO provides a full stack approach to go from indexing to a basic user interface searching a generated Knowledge Graph, an example index. The whole GleanerIO stack can be run on a laptop (it uses Docker Compose files) or deployed to the cloud. Cloud environments used include AWS, Google Cloud, and OpenStack.

GleanerIO is also designed to play well with others. As long as packages work well in a web architecture framework, they likely can be integrated into the GleanerIO approach. The GleanerIO approach is modular and even Gleaner itself could be swapped out for other implementations.

Indeed, GleanerIO advocates _principles over project_. GleanerIO is really just a set of principles for which reference implementations (projects) have been developed or external projects have been used. These have evolved and been implemented to address communities like Ocean InfoHub, Internet of Water, GeoCODES and more. The results and approaches of these communities are openly maintained at the GleanerIO GitHub Organization pages. They provide guidance on how yet other communities could leverage this approach to address their functional needs.
Indeed, GleanerIO advocates _principles over project_. GleanerIO is really just a set of principles for which reference implementations (projects) have been developed or external projects have been used. These have evolved and been implemented to address communities like Ocean InfoHub, Internet of Water, GeoCODES and more. The results and approaches of these communities are openly maintained at the GleanerIO GitHub Organization pages. They provide guidance on how other communities could leverage this approach to address their functional needs.
## History

Communities of practice can leverage open schema (schema.org) along with web architecture approaches to build domain search portals. Enhance and extend with community vocabularies to address specific domain needs. This foundation is also leveraged by Google Data Set Search and is complementary to that service. Web architecture as foundation allows a community to provide a more detailed community experiences, while still leveraging the global reach of commercial search indexes.
Communities of practice can leverage open schema (schema.org) along with web architecture approaches to build domain search portals. Enhance and extend with community vocabularies to address specific domain needs. This foundation is also leveraged by Google Data Set Search and is complementary to that service. Web architecture as a foundation allows a community to provide more detailed community experiences, while still leveraging the global reach of commercial search indexes.