Conversation
mbtaylor
left a comment
There was a problem hiding this comment.
Looks basically good, I made a couple of suggestions.
| IVOA-science. The purpose of this rule is to help operators to throttle | ||
| indiscriminate downloads by ``stupid'' crawlers (like the harvesters | ||
| employed to gather training material for AI models around 2025) without | ||
| impacting common clients; for instance, rate limits could be tight | ||
| without a conforming user agent header. |
There was a problem hiding this comment.
I wouldn't say simply "the purpose of this rule is [throttling]", since there are other use cases, for instance managing usage statistics. Possible alternative wording:
Presence of this header provides a means to identify requests by known VO-aware clients as distinct from those by potentially indiscriminate crawlers like the harvesters employed to gather training material for AI models around 2025. This information may be used for instance to throttle indiscriminate downloads by applying tighter rate limits for requests without a conforming user-agent header, or for better understanding of usage statistics by distinguishing known science queries.
| The access was done to directly support a science case. This explicitly | ||
| includes education and training, in particular because we do not want to | ||
| suggest that software used in such settings -- which plausibly is going | ||
| to be the same as software used in pure research -- should be | ||
| reconfigured for them. |
There was a problem hiding this comment.
I'm not sure about "to directly support a science case"; I'd suggest something a bit more woolly like "in support of science usage" or "in the context of science usage". I think the main target here is to differentiate clients that understand the VO/astronomy services they are engaging with from those that are just hitting anything they can find. From a practical point of view, at least for clients like topcat and stilts, it's not likely to be feasible to get them to present different user-agent headers on the basis of the user intention for particular
requests, only on the basis of the tools in use.
Given that I'm wondering if there's a different term than "science" that should be used here, but I don't have great suggestions. IVOA-voclient or just IVOA-client maybe?
|
On Tue, Jan 06, 2026 at 08:04:57AM -0800, Mark Taylor wrote:
@mbtaylor commented on this pull request.
> Presence of this header provides a means to identify requests by
> known VO-aware clients as distinct from those by potentially
I like that better, too, and thus I've (by and large) adopted it in
commit 8d666479.
Given that I'm wondering if there's a different term than "science"
that should be used here, but I don't have great suggestions.
IVOA-voclient or just IVOA-client maybe?
Hm... I don't like "client" here because crawlers and validators
arguably are clients, too. I'll try the IVOA mailing lists for more
precise terminology. I give you, anyway, that it's a bit lame to say
"science" and then say "but it's education, too".
|
This PR mainly reverts the previous stance that "normal" requests have no special ivoa tag in hopes to develop a marker for "well-behaved client".