Skip to content

Conversation

@vinhloc30796
Copy link

@vinhloc30796 vinhloc30796 commented Oct 10, 2025

All Submissions:

  • Have you followed the guidelines in our Contributing documentation?
  • Have you verified that there aren't any other open Pull Requests for the same update/change?
  • Does the Pull Request pass all tests?

Description

Wire Solana with samples & tests

  • Add solana to config ledgers
  • Add BigQuery query and wire data collection
  • Wire parser to DummyParser and mapping to DummyMapping
  • Add sample_solana_raw_data.json
  • Add DummyParser and DummyMapping tests for Solana

Closes #153

Checklist

/* Keep from below the appropriate checklist for your Pull Request and remove the others */

New Ledger Support Submissions:

  • What mapping information did you add for the new ledger?
    • identifiers
    • addresses
    • clusters
    • legal links
  • Did you create a new parser?
    • If yes, did you create a unit test for the new parser?
    • If no, which parser did you reuse?
      • DefaultParser
      • DummyParser
      • EthereumParser
  • Did you create a new mapping?
    • If yes, did you create a unit test for the new mapping?
    • If no, which mapping did you reuse? /* mapping name */
      • DefaultMapping
      • DummyMapping
      • EthereumMapping
      • CardanoMapping
      • TezosMapping
  • Did you enable the parser for the new ledger in consensus_decentralization/parse.py?
  • Did you enable the mapping for the new ledger in consensus_decentralization/map.py?
  • Did you document support for the new ledger as described in our Contributing documentation?

Update Mapping Support Information Submissions:

  • For which ledger do you update the mapping information?
    • /* ledger name */
  • What mapping information do you update?
    • identifiers
    • addresses
    • clusters
    • legal links
  • Did you update the tests (if needed)?

New Metric Support Submissions:

  • Did you put the metric's script under consensus_decentralization/metrics?
  • Did you name the metric's main function of the script compute_{metric name}?
  • Did you import the metric's main function to consensus_decentralization/analyze.py?
  • Did you add the new metric (and possible parameter values) to config.yaml?
  • Did you write unit tests for the new metric?
  • Did you document the new metric in the documentation pages?

- Add solana to config ledgers
- Add BigQuery query and wire data collection
- Wire parser to DummyParser and mapping to DummyMapping
- Add sample_solana_raw_data.json
- Add DummyParser and DummyMapping tests for Solana
@vinhloc30796
Copy link
Author

hi @LadyChristina I found the repo through your presentation at SBC 2025. please let me know what i need to change in this PR to get this merged, would love to proceed to add Solana everywhere across EDI too

cc @dimkarakostas for vis

@LadyChristina
Copy link
Member

Hi @vinhloc30796, thanks a lot for your contribution. Adding support for Solana is something we've been wanting to do for a long time, so this is very helpful. However, I'm a bit concerned about the lack of "mapping" data, as this means that an entity that controls multiple addresses will be presented as multiple independent entities in the results. Do you know if there are any websites that might offer this kind of attribution data for Solaba addresses? Like Etherscan does for Ethereum for example. Or alternatively, if there is a way to determine from the addresses themselves if they belong to the same user (this is something we do in Cardano for example by taking advantage of the address / staking keys format).

@vinhloc30796
Copy link
Author

vinhloc30796 commented Oct 11, 2025

Do you know if there are any websites that might offer this kind of attribution data for Solana addresses? Like Etherscan does for Ethereum for example.

yes, Solscan came immediately to mind. They have an API that comes at $199/month with this endpoint for label https://pro-api.solscan.io/pro-api-docs/v2.0/reference/v2-account-metadata

there are some free options:

Question 1: @LadyChristina how should I submit this data in the PR? should i just do a mapping_information/addresses/solana.json and a mapping_information/clusters/solana.json?

some other paid options:

Question 2: @LadyChristina does EDI get support or resources for these paid endpoints?

@LadyChristina
Copy link
Member

Question 1: @LadyChristina how should I submit this data in the PR? should i just do a mapping_information/addresses/solana.json and a mapping_information/clusters/solana.json?

This will depend on the type of data that you collect. Data that links an address to the entity that controls it should be indeed under mapping_information/addresses/solana.json in the following format:
{
"address1": {"name": "Validator1", "source": "example.com"},
}
The source there would correspond to the website you get this data from, e.g. validators.app or solanabeach.io.

If you also have data that clusters together different validators then this would go under mapping_information/clusters/solana.json in the following format:

{
"validator id 1": {
"cluster": "cluster A",
"pool": "validator 1",
"source": "example_url.com"
},
"validator id 2": {
"cluster": "cluster A",
"pool": "validator 2",
"source": "example_url.com"
}
}
This example is telling us that validators 1 and 2 belong in the same cluster of validators (cluster A), and the source is a website that contains this information. Note that a cluster here is a group of validators that are all controlled by the same entity. Note that in some cases this may not be needed, depending on how the address mapping info is structured. For example if there is an address that is linked (through the addresses json file) to "Binance1" and another address that is linked to "Binance2" then we would need the clustering information to say that both "Binance1" and "Binance2" are controlled by Binance. However, if the addresses are already mapped to "Binance", then the additional step is not needed.

Hope this is clear, but let me know if you still have any questions.

Question 2: @LadyChristina does EDI get support or resources for these paid endpoints?

I'm afraid we don't currently have any funding that we can use towards obtaining such data, though @mtefagh will be better-suited to answer this.

@vinhloc30796
Copy link
Author

re Q1: sounds good @LadyChristina, i was just confirming because it may be large, let me attempt & then we can go from there!
re Q2: yes lmk @mtefagh just in case, for BigQuery it was easy for me to run a test job, but for larger clustering it may be more expensive to even get access in the first place

- add collect_solana_validators.py with CLI and retry
- resolve token via CLI, env, or mapping_information/.env
- output addresses map {name, source}, one entry per line
- support --include-vote and --copy-to-pool (full metadata)
- add get_solana_info.py identifiers and clusters by homepage
- set cluster source to homepage URL
- add SOLANA.md and .env.example
- ignore .env; tidy docs formatting
@vinhloc30796
Copy link
Author

@LadyChristina i've added more data, there are ~900 validators on Solana, whom I have clustered using the same method as Cardano -- lmk your feedback!

WHERE is_coinbase is TRUE
AND timestamp > '2018-01-01'
ORDER BY timestamp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing closing quotes here

'litecoin': DefaultMapping,
'zcash': DefaultMapping,
'tezos': TezosMapping,
'solana': DummyMapping,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the mapping information is in place, this should be changed to a non-dummy mapping. I'm assuming we'll need a new SolanaMapping that inherits from DefaultMapping and overrides some functions, like map_from_known_clusters

@LadyChristina
Copy link
Member

@LadyChristina i've added more data, there are ~900 validators on Solana, whom I have clustered using the same method as Cardano -- lmk your feedback!

Thanks @vinhloc30796, overall it looks really good! I just want to try to run everything on my side properly before approving the PR. One comment I have is that now that the mapping data exists, we also need to use it in the code (see comment above).

Then, there's also a part where I actually got confused when running it on my end, but I don't know if I'm just doing sth wrong, as I don't really understand why it's happening. I tried to run the solana query on BigQuery, but when I do it through the collect_block_data script, for some reason the result is not ordered by timestamp (but I see that the query has the relevant order by part, so I don't understand why this is happening). So first, I wanted to confirm what is the behaviour on your side: are you running it through this script or directly on BigQuery / some other way? And is the output file properly sorted for you? And how large is it? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Solana blockchain

2 participants