Building open source infrastructure for the Ethereum ecosystem

ethereum

#1

Richard Burton of Balance.io sat down to examine the problem of reliably getting token balances and more browsable transactions and queries of the blockchain.

In short, the problems are:

  • today, Etherscan and Ethplorer are two closed source APIs that are two single points of failure that many developers are building on top of (which also means those two sites negate some of the decentralized and trustlessness elements of interacting with the blockchain)
  • running your own infrastructure to get token balances per wallet address AND historical transactions is a large undertaking which doesn’t add business value and is costly
  • without any open source code, multiple teams are potentially building the same thing

Both the RallyChain and Bidali teams are building pieces of this infrastructure as commercial services.

After some back and forth with Eric Kryski from Bidali, the rough estimate is that this would cost $5K - $10K in monthly server costs. This is for the whole world and whole ecosystem — so this isn’t that bad — but that’s still a large recurring cost.

The bigger cost is a team of people with blockchain, devops, and security expertise to keep the system up and running on a continual basis. That probably looks like $30K - $40K in people and admin costs.

So all of a sudden we’re looking at $600K annual cost.

And yet, having this functionality available for the Ethereum ecosystem is potentially a huge game changer. It enables developers of Ethereum client apps to reliably get token balances and other queries, which can support the growth of many more production dapps. It also levels the playing field - if the cost of entry to run your own version of this is so high, we are saying that Ethereum development is only for massively funded teams.

Can we Gitcoin our way to funding an initial build out of this infrastructure? Between foundations and project grants and “pro” business subscriptions, can we sustain an annual infrastructure budget for things like this?


#2

On second blush it doesn’t seem all that different than a startup; it’s just more of a bottom-up approach since the customers know what they want already.

It would be very cool if it could be orchestrated using a DAO- I imagine the process could look something like this:

  1. Early adopters create a new DAO and each invest capital into the organisation. Decision making would be conducted by voting, as in a board. Voting power could be proportional based on their investment (i.e. shares/tokens).
  2. The DAO creates a Gitcoin bounty using the capital to create a proof-of-concept.
  3. If the PoC is successful, the early adopters can use the new SaaS and pay for operational costs. They will be incentivised to use it as they have already invested capital.

It doesn’t seem all that different from doing an ICO as many companies are doing. However, there are some differences here in that the customers are the organisation. The customers are spending their own money to build the service they desire, and then using the service when it’s finished.

In fact by making the early adopters the company, is the investment no longer a security? They’re not looking to make gains, they’re looking to kick-start a useful service and pay for it.


#3

Thanks for adding your thoughts.

I think the tough part is kind of like a Kickstarter. Everyone would want to be convinced that what came out solved their problem.

Richard from Balance doesn’t have the resources to build it up front, needs it to not be reliant on the current centralized options, and might pay for it AFTER its built. Is there a “staking” model, where some goes in up front, and then you stake 3 - 6 months of what you would pay for it to show there are built in customers?

“Is it a security?” is one of those question where all the details matter. How to structure it is a solvable problem.

How to keep it sustainable over the long term is the tough part.


#4

Agreed; there would have to be a very strong vision that stakeholders could clearly understand and rally behind. I think the responsibility would fall on the leader of the group who is spearheading the project.

The whole decentralisation argument is interesting. Richard wants it decentralised, but in this case the technology would likely be off-chain and run by someone using their infrastructure. The “decentralised” aspect could be the fact that the tech is open source: that anyone can start their own service.

And like you said, kick-starting that open source code would be the hardest part. Once it’s written it would be easy enough to take it for a test drive to see if it fits the desired use-case, but capital is needed.

I do like your staking idea; it actually reminds me of Tezos. They raised $250 million (IIRC?), and now they’re fighting over the money and potentially mis-handling it. Where is the accountability?

If instead the money was trickled out and could be rescinded after a minimum waiting period, that could keep the company austere and hungry to produce value. If value was produced then the stake could continue to pay for operations.


#5

Some of the decentralized discussion being had here that you should join @asselstinehttps://gitter.im/ethodi/Lobby and the accompanying Github repo, the Ethereum Open Data Initiative https://github.com/ethodi

The goal – of being able to do this via a laptop – is a good one. And this point from the chat was good:

why is it so hard to extract data from ethereum nodes today? can we do better? can we build a protocol/marketplace around it?
because clients are focused on validating/mining blocks. And querying/indexing historical data is a totally different task


#6

This thread on incentivizing people to run full Ethereum notes is excellent:

I like the part about “the network” paying clients directly, as it gives a direct business model for “merely” running a client that lots of people use.


#7

Final reference here, which really summarizes the issue. From the QuickBlocks team:

Developers are left with a conundrum: either decentralize and be forced to open source your code or centralize it thereby eliminating the reason we all got into this thing to being with. This, I think, is why there’s currently no viable solution to the problem. There is no appealing path forward.


#8

Hi Boris. This is Jay Rush from QuickBlocks. Thanks for the discussion.

I notice this quote in the original post:

running your own infrastructure to get token balances per wallet address AND historical transactions is a large undertaking which doesn’t add business value and is costly

I’m not totally sure this is true. In fact, this is exactly what we’re exploring at QuickBlocks. It’s not that hard to run a node. Plus–there’s huge business value to having the data. Plus–it’s not costly. It’s costly only if one tries to index the entire blockchain.

I run parity on a local Mac Desktop and a Linux box. QuickBlocks lists of historical transactions for specific accounts (from which you can build account and token balances) with a minimum imposition on the target machine. This allows us to maintain a totally decentralized stance–in other words, QuickBlocks runs on the same machine as the node and relies only on the node for its data. We’re a bit slower gathering historical lists than a fully indexed database of accounts per block, but we significantly smaller and way faster than the RPC (which is the alternative).

Most of the discussion I see about solving the ‘data’ problem assumes that the solution is full database index of the blocks. I think this will lead every attempt of this type towards centralizing (because you won’t be able to afford to collect a full database of the blockchain without trying to monetize the data). Also, I’m not sure how a fully indexed database will scale once the blockchain shards and the data explodes 100 fold.

I guess what I’m trying to say is that the infrastructure we should be discussing for the Ethereum ecosystem should be as decentralized as the node itself. To me this means it runs on the same machine, makes a minimum imposition on that machine (over the node), and gets its data directly from the node.


#9

The context was Richard Burton — and then Eric from Bidali — who need to be running user facing services at scale.

I totally agree that a decentralized solution is the right way to do it.

Now the challenge is pooling the up front funding to pay (you? someone?) to build the first version that works, so people aren’t tempted to go down the centralized, client-server route that people know how to scale today.

What do you think would be needed in time / money to do this?


#10

The context was Richard Burton — and then Eric from Bidali — who need to be running user-facing services at scale.

Right. There are use cases where access to every account quickly is needed. I’m wondering (not questioning, wondering) if that needs to be infrastructure. Another use case (the more likely?) would be for individuals to have access to their own “accounts of interest.” For me, this means my 8-10 addresses, plus the 20 smart contracts I’ve participated in, plus maybe a few popular smart contracts. I think it is the need for ‘access to every account quickly’ that leads to 'large", “costly” undertakings and eventual centralization.

What do you think would be needed in time/money to do this?

A good part of QuickBlocks is already written and working. I would look for recovery of some of that prior effort. Moving forward, there’s a need for serious testing (the filtering task, even in the big data case, is non-trivial because of internal transactions, etc.), better documentation, a strategy for fully open sourcing the code (it’s about 50-50 open source right now), and other stuff. So, the time/money question is complicated.

So people aren’t tempted to go down the centralized, client-server route that people know how to scale today.

It’s actually kind of interesting…the blocks are independent of each other. The per-block / per-account filtering can be easily parallelized. On a small machine, with a couple of cores, the filtering is slower. On a system that has a lot of cores, or if you can run many machines, you could split the filtering across multiple processes. It might scale up as quickly as the number of processes you throw at it.

My current thinking is that a decentralized solution starting on a tiny machine scales up to a “big data” solution more easily than a solution that starts as “big data” and tries to scale down.


Portfolio Portability - Import / Export Ethereum wallet addresses and transactions
#11

I don’t think it needs to be complicated. I totally agree that your previous work should be “bountied out”.

Maybe think about it as:

  • time/people needed to focus on it fully for N months – $10K - $20K per month for 3 months?
  • bounty for committing to open source – $50K? $100k? you could also defer this to after the initial 3 months
  • ongoing support or specific, initial feature bounties – likely won’t know this until the initial 3 months.

The Open Collective system could be used to set this up. Is this something you want to pursue? I am happy to help with this and look at getting the money raised. I think it starts by getting some money donated / funded and being transparent about how it is being spent. So perhaps $30K+ to start so that it can be focused on for 3 months.

Will you be at EDCON in Toronto? I’m meeting Richard Burton there so will discuss this too. Happy to schedule a call to talk this through, work on a budget, come up with a list of organizations to approach, etc.

Yes! I started thinking about Portfolio Import / Export as a way to be portable across systems. I own my data, I know which addresses are “mine” (or ones I want to watch). This could be a data portability solution as well, where I can login to different dApps and bring my portfolio with me.

Might just start with a data format standard and IPFS or other URI. It’s like the tiniest of side chains :slight_smile:

I wrote up a wiki page with some thoughts on Portfolio Portability


#12

Hello everyone,

I was forwarded to this conversation from the https://gitter.im/ethodi/Lobby channel mentioned earlier. We are working on a similar use-case i.e. making data easily available for everyone. But our end goal is performing data-analysis, machine learning etc (https://www.analyseether.com/).

We have already launched an MVP and have already open-sourced its code.

As mentioned earlier there are two ways to get the data:

  1. Make it available in a centralized server and give access using API’s
    We are making our code public, but serving the data from a centralized server. Since our use-case is analysing the data, a local machine cannot perform computations quickly for the current chain size (>200 M transactions).
  2. Develop faster methods to get data from a node
    For accessing individual portfolio’s, this might be a better/cheaper strategy.

Would love your thoughts on this :slight_smile:
To get access to the MVP fill this form


#13

Awesome @ankitchiplunkar thanks for dropping in to share. Will talk about it with people at EDCON in Toronto next week.