Steem Pressure #8: Power Up Your dApp. RethinkDB? KISS my ASAP.


While building your Steem blockchain dApp, you will soon realize that it’s not that easy to get the data you are looking for quickly, unless there’s already an API that gives you what you need.

We have several reliable, general purpose, “full” API nodes, provided by Steemit Inc. and community members:

Steem API endpoint owner
https://api.steemit.com Steemit Inc.
https://anyx.io @anyx
https://api.steem.house @gtg
https://steemd.minnowsupportproject.org @followbtcnews
https://steemd.privex.io @privex

You can use them for development or even (if you are brave enough) for production.
If your dApp is small and simple, it might work. And yes, in many cases it will.
But is it optimal? No.

In previous episodes, I described different types of nodes and their names and how this can be misleading when it comes to the so-called “full” nodes.

TL;DR
A full node means here something different than in the Bitcoin realm, where Bitcoin full node is pretty much something that a Steem consensus node can do.
Here, on Steem, the word “full” doesn’t refer to anything related to the blockchain - it refers to the fully featured set of APIs enabled within steemd.

Do you really need a full set of API calls?
In most cases you don’t. To get an idea about the information you can get from common Steem APIs, take a look at the devportal:

https://developers.steem.io/apidefinitions/

By the way, please consider voting for Steem.DAO @inertia’s proposal, who already greatly contributed to the Dev Portal Documentation and is eager to do more.

For example, if you need the information returned by get_market_history, take a look at condenser_api.get_market_history and at its underlying api, in which case you will find:

Also see: market_history_api.get_market_history
Which means that the API node you need should be running a market_history plugin.

Another example is a situation in which you want to find who “Resteemed” (a.k.a. “Reblogged”) a given post.
Here, you are lucky, because there’s a method for that called get_reblogged_by and provided by the plugin follow.

Maybe you want to get tags used by a specific author? That’s easy too, there’s the get_tags_used_by_author method provided by - yes, you guessed correctly - tags plugin.

There’s a pretty big chance that if your app needs to use get_market_history, it doesn’t need tags or follow plugins, because it’s not blogging oriented, and vice-versa - if you deal with tags and follow, you probably don’t care about the internal market.

Such knowledge lets you optimize the API infrastructure needed for your dApp.
For instance, you can run a “fat node” without market_history, account_history(see Steem Pressure #6) and use hivemind to get the tags and follow features.

That’s not the end yet. You get a lot more data than you’ve asked for, but still that may not be what you need.

Do you know why? If not, then please, read the first paragraph again.

How many blocks has the witness gtg missed so far?
That’s easy.
Run:

curl -s --data '{"jsonrpc":"2.0", "method":"database_api.find_witnesses", "params": {"owners":["gtg"]}, "id":1}' https://api.steem.house

You will instantly get your answer, because the find_witnesses method returns this information… among many pieces of information that you don’t need.

Now, try something more complicated:

Problem to solve

How many blocks did @gtg produce until block 40000000?
And how many blocks did @gtg produce in August 2017?
And how many blocks did he produce that contained more than 50 transactions?
And what percentage of blocks produced in 2018 had more non-virtual operations than transactions?
How many blocks had only one transaction and which of them had the highest number of non-virtual operations? What is the id of the transaction that contains them?

You should be able to answer all these questions, all the data are in the blockchain after all. The problem is that steemd has no suitable API to answer these questions quickly.
This means that you either have access to a third party service / API that can answer these questions (see @arcange’s SteemSQL “SQL server database with all Steemit blockchain data”, which is available in subscription model), or look into blocks to find your answer.

In the latter case, you will have to inspect all the blocks ever produced using the get_block method.
The good news is that every consensus node can run the block_api plugin.
The Steem blockchain has recently passed the 40 million blocks mark.
This means that in order to answer the first question, you will need to send more than 40 million requests to the API endpoint.
Each time this type of question is asked you will need to repeat the procedure, unless you…

Build and run your own custom Steem API


steemit-meme-api.jpg

TL;DR in Memeglish so that project managers can get it.

No. I’m not going to build an API in this post.
It is supposed to be your own API, solving your problems, optimized for your use case.
Instead, I’m going to play with some blockchain data to give you some tips.

To illustrate this, let’s analyze the problem described above. .
Let’s ignore incoming blocks and keep our focus in the range up to 40000000 blocks (catching up with the head block and dealing with micro-forks is a good topic for another episode; in the meantime, you can take a look at how hivemind does this).
Let’s assume that you can’t or don’t want to extend steemd with your custom fancy_plugin or a fancy_plugin_api that does the job.

Keep It Simple Stupid

We don’t need information about virtual operations, so instead of using get_ops_in_block, which requires account_history plugin and API, we can use get_block, which saves a lot of resources, if you want to run your own node.

How does a request look?

curl -s --data '{"jsonrpc":"2.0", "method":"block_api.get_block", "params":{"block_num":197696}, "id":1}'

How does a response look?

{
 "jsonrpc": "2.0",
 "result": {
  "block": {
   "previous": "0003043f5746b4d9d06932467ad852ac5d71231c",
   "timestamp": "2016-03-31T13:56:18",
   "witness": "sminer22",
   "transaction_merkle_root": "63321e3f001bed17d301399c5eeaa9b37a0bf74b",
   "extensions": [],
   "witness_signature": "1f64e94d51baef0b84be04644fafc0b05959e80db64de7b099d00499587f23f97216acd3d8df3c3b81f672b24f8cf47167720d205693d3f5713a65bcf004cffef3",
   "transactions": [
    {
     "ref_block_num": 1087,
     "ref_block_prefix": 3652470359,
     "expiration": "2016-03-31T13:56:45",
     "operations": [
      {
       "type": "vote_operation",
       "value": {
        "voter": "proskynneo",
        "author": "proskynneo",
        "permlink": "steemit-firstpost-1",
        "weight": 10000
       }
      }
     ],
     "extensions": [],
     "signatures": [
      "202765ea6b688c2ce968ff2b3f1f6715fddd67374f409a202ae720ca072cb6a919175c27921edfc24c5e3d898a221e88395af31a155b1d60b2bd5475d30a30e226"
     ]
    }
   ],
   "block_id": "00030440d0fbfc8323a1388cd009f2a7f9f43162",
   "signing_key": "STM6tC4qRjUPKmkqkug5DvSgkeND5DHhnfr3XTgpp4b4nejMEwn9k",
   "transaction_ids": [
    "679bcc6414b2f2bd16818a83f020bb9a478a84c4"
   ]
  }
 },
 "id": 1
}

Well, when pretty printed for your convenience, API normally returns a compact version, which is just one line without extra whitespaces.

Caveat

As you can see, the resulting data don’t contain block_number, but you can use the id that equals the block number you are asking for.

We need to get 40M blocks.
That’s a lot, but fortunately you need to do this only once.
Irreversible blocks are not going to change because… well, I hope you can guess why ;-)
Even if you use jussi’s batch requests, throwing that many requests at public API nodes will quickly trigger rate limiting, and even if it doesn’t, it will take a significant amount of time.

Your own consensus node for simple tasks

You can use your own consensus Steem node for that. Local means faster.
Think about latency and how any latency (1ms? 2ms?) is no longer negligible if you multiply it by 40000000.
Such a node is not resource hungry, and with MIRA you can easily run it on a 8GB RAM machine with 500GB storage.
A consensus node currently needs 245GB for block_log and 56GB for the state file.
Besides, your dApp will benefit from such a node also for broadcasting purposes.
Configuration is as simple as that:

Use someone else’s hard work

There are already people (like me) who serve the block_log to speed up replay times.
If only there were a place where one could bulk download JSON files with Steem blocks...
Well, if there’s a need for it, I can make it available with 1M block packages every time we reach 5 or 10 million.
It will save a lot of time for all microservices that require block processing (the last mile can be synced using traditional methods)

Bulk

You’ve already seen how a single block looks in JSON format.
Obviously, we are using compact output to save space.
The whole data set (40M blocks in the uncompressed JSON format) takes 580GB.
For our needs, we need only a small subset of such data.

To get what’s needed we can use jq:

jq -c '
{
 id:        .id,
 block_id:  .result.block.block_id,
 timestamp: .result.block.timestamp,
 witness:   .result.block.witness,
 txs:       .result.block.transactions | length,
 ops:      [.result.block.transactions[].operations[]] | length,
 txids:     .result.block.transaction_ids
}'

After processing our blocks we get 46GB of uncompressed json data.

Everything Should Be Made as Simple as Possible, But Not Simpler

A flat file? Are you kidding me?
Use your favorite database to store it.
(Unless your favorite database is MongoDB, then use something else.)

Or maybe RethinkDB?

Some time ago @someguy123 told me about some fancy database designed to store JSON.

”RethinkDB is the first open-source, scalable JSON database built from the ground up for the realtime web. It inverts the traditional database architecture by exposing an exciting new access model - instead of polling for changes, the developer can tell RethinkDB to continuously push updated query results to applications in realtime. RethinkDB’s realtime push architecture dramatically reduces the time and effort necessary to build scalable realtime apps.”

So this not only means effective and effortless storage of JSON data, but also sounds like a perfect choice for many Steem dApps.
But is it? Maybe. For some needs. It’s not perfect for sure, but no solution is.
I’m going to give it a try.
Do your own research. Maybe it will fit your needs. If not, look for something else.

Let’s see how easy it is to store data we need in RethinkDB

Step 1: Install RethinkDB

Check the RethinkDB webpage for installation instructions suitable for your server.
For Debian or Ubuntu, it’s enough to do the usual:

Add RethinkDB repository
echo "deb https://download.rethinkdb.com/apt `lsb_release -cs` main" | sudo tee /etc/apt/sources.list.d/rethinkdb.list
wget -qO- https://download.rethinkdb.com/apt/pubkey.gpg | sudo apt-key add -
sudo apt-get update
Install RethinkDB package
sudo apt-get install rethinkdb

That’s all. No questions asked.

Install the RethinkDB python driver

Because the rethinkdb import subcommand has - surprisingly - an external dependency on the python driver, depending on the way you installed RethinkDB, you might need to get the RethinkDB python driver too.
For that, you need to have pip, and, ideally, virtualenv, so if you don’t then:

sudo apt-get install python3-pip python3-venv

and once you have them, install the RethinkDB python driver into your virtual environment:

python3 -m venv ./rethink
source rethink/bin/activate
pip install rethinkdb
Step 2: Run the server

rethinkdb
Yes. That’s it for now. You can play with more advanced features later.
It enables an administrative console available through HTTP on localhost:8080

Step 3: Load your data.

Throw your JSON data at RethinkDB instance by running:
rethinkdb import -f blockstream.json --table steem.blocks
Where blockstream.json is your big 46GB file with data from 40M blocks.
It goes to a table called blocks within a database called steem.

  [========================================] 100%
  40000000 rows imported to 1 table in 3615.68 secs
  Done (3615 seconds)

Not bad, on my hardware it consumes json stream a little bit faster than what a 100Mbps link could provide.

rethinkdb_data takes 66GB


rethinkdb.png
Step 5: Make use of the data

You can use the Data Explorer feature from the administrative console, or use Python, Javascript or Ruby to get the data you are looking for.

If you want to learn more look at the RethinkDB documentation

Depending on your needs, you can create and use secondary indexes to speed up your queries.

Or...

Use a different set of tools and solutions that meet your needs.
A less exotic solution is to use MySQL that supports a native JSON data type, which combined with generated columns might be a good alternative.
(I’m going to try that too.)

Previous episodes of Steem Pressure series

Introducing: Steem Pressure #1
Steem Pressure #2 - Toys for Boys and Girls
Steem Pressure #3 - Steem Node 101
Steem Pressure: The Movie ;-)
Steem Pressure #4 - Need for Speed
Steem Pressure #5 - Run, Block, Run!
Steem Pressure #6 - MIRA: YMMV, RTFM, TLDR: LGTM
Steem Pressure #7 - Go Fork Yourself!
Stay tuned for next episodes of Steem Pressure :-)



If you believe I can be of value to Steem, please vote for me (gtg) as a witness.
Use Steemit's Witnesses List or SteemConnect
Or if you trust my choice you can set gtg as a proxy that will vote for witnesses for you.
Your vote does matter!
You can contact me directly on steem.chat, as Gandalf



Steem On

Comments 50


This post has been resteemed by @witnessnews.

Follow @witnessnews to keep up with active witness updates.

03.02.2020 21:03
0

I am trying it on my phone now, thanks! ;D

03.02.2020 21:19
0

I'm definitely not the target audience for this post, but I'm glad you're doing what you do, so I don't have to. ;) Stay awesome!

03.02.2020 21:25
0

Oh come on, I've made a meme explaining everything ;-)

03.02.2020 21:52
0

Ha! Right... Here's hoping I can reach enlightenment without running an API. 'Cause if I tried it, you'd need to create a new meme with a brain exploding instead of getting superpowered.

03.02.2020 21:56
0

COMPLAINTS FILED AGAINST STEEMIT

Unfortunately because of a few scumbags like the following members, Steemit's days as a viable forum are quickly coming to an end, because it has essentially turned into nothing more than a cesspool of fraud and corruption, controlled by childish members, porn promoters, and other such low-life human beings.

@themarkymark (AKA @buildawhale) - 77.6

@bullionstackers - 71.9

@buildawhale (AKA @themarkymark) - 74.9

@quarantine (AKA @the-reef / AKA @block-power) - 61.2

@bookguy - 61.0

@mack-bot - 58.8

@the-reef (AKA @block-power / AKA @quarantine) - 54.9

@block-power (AKA @the-reef / AKA @quarantine) - 25.0

@blockcreate - 25.0

@steemcleaner - 1.07

and others.

The Facts: If any of the above members do not like one of more of your posts, comments or replies for any reason, they will not only down-vote that post or comment, but will then go back and down-vote many, if not all of your previous posts and comments (plus any you post in the future), thus reducing your reputation rating to near nothing. A scumbag is a scumbag, but the real problem lies with the idiots who run and control Steemit. Only a bunch of total morons would ever set up a voting system as it now is designed. The main problem is that a higher reputation member can always down-vote a lower-reputation member, which will have a significant impact on the lower-reputation member's rating. The reverse however does not hold. A lower-reputation member can indeed down-vote a higher-reputation member, but unlike the reverse, the lower-reputation member's down-vote has no impact on the reputation of the high-reputation member. It's like it never happened.

In any event, whoever devised such a system, unfortunately has more than a few marbles missing. But quite frankly, who cares? I am now in the process of powering down my steem (10 weeks to go) so that I can exit this cesspool as so many thousands have done before me. So if the Steemit's management really wanted to create a cesspool of such scumbags, a real haven for porn lovers and other such low-lifes, congratulations! They accomplished that! Steemit will not survive much longer! Of that fact, I am quite certain!

To all the scumbags here on Steemit: Go ahead, and bring on the down-votes, and perhaps it may help you feel good for a while, but in the long run, it will not fill your otherwise pathetic empty lives.

To all other Steemit Members: I'm sure you have come to realize some of these things, as well as the fact that steem payouts are nowhere remotely close to what they are claimed to be. Only those at the very top of the pyramid get the bulk of the payouts, while the rest of us (the vast majority of us), get virtually nothing.

Please also note that you would be wise to withdraw all your steem from Steemit asap. Official complaints have been filed with the following Government Organizations seeking the shutdown of Steemit:

The Federal Trade Commission (FTC)
https://www.ftccomplaintassistant.gov/?utm_source=takeaction#crnt&panel1-1

The Internet Crime Complaint Center (IC3) https://www.ic3.gov/default.aspx

The Federal Bureau of Investigation (FBI)
https://www.fbi.gov/investigate/cyber

The Securities and Exchange Commission (SEC)
https://www.sec.gov/divisions/enforce/claims.htm

Meanwhile, the best thing you can do to help the Steemit platform, and give it a chance of survival, and if you have a higher-reputation, is to down-vote the above members so if not to get them off of Steemit, at least take away their abusive down-voting power.

03.02.2020 21:34
0

I think this post and series will be useful for those needing to efficiently get a lot of data from the STEEM blockchain. For my limited uses, steem.js has been working fine.


By the way, the link for Steem Pressure #7 seems broken. Here's where that post is:

https://goldvoice.club/steem/@gtg/steem-pressure-7-go-fork-yourself-step-by-step-guide-to-building-and-setting-up-a-mira-powered-hf21-ready-steem-consensus-node

03.02.2020 22:12
0

Thank you, I just fixed that.
(I was shortening links and went too far with that ;-) )

03.02.2020 22:43
0

This post has been resteemed by @witnessnews.

Follow @witnessnews to keep up with active witness updates.

03.02.2020 22:40
0

Your bot is broken. It comments again when matching and already resteemed post is edited.

03.02.2020 22:48
0

A useful post and sure, you have my vote since a long time.

And today, I add as well a !BEER for you.

PS: Is there someone who might help me to get a node up and advise us in monitoring?

03.02.2020 22:41
0

Thank you :-)
Sure, as soon as I manage to get some spare time (try to catch me on Steem.Chat), but please be patient (to get an idea about my spare time, you can see when I posted previous episode of Steem Pressure ;-) )

03.02.2020 22:46
0

Hi @gtg. I know you are damn busy and my question was more into the direction of the community....

03.02.2020 22:57
0

I like your optimism, but I'm afraid that there are not many people that will actually read my post and then - if still no sleeping - proceed to a comment section ;-)

03.02.2020 23:03
0

Hey @gtg, sure I miss all the “easy to eat cat content” with your post, but there are for sure more people as you think who read your work.

Building dapps on steem is “the 2020 thing“ and therefore you need to read from the chain.

04.02.2020 05:51
0

I like your optimism, but I'm afraid that there are not many people that will actually read my post

Haha yeah, looks like no one read Gandalf's posts. Seems like no one understand such wizard's gibberish. No matter if what you are trying to hack, are computers, blockchain technology or human minds. LoL };)

05.02.2020 04:14
0

View or trade BEER.


Hey @gtg, here is a little bit of BEER from @detlev for you. Enjoy it!

Learn how to earn FREE BEER each day by staking.

03.02.2020 22:42
0

8972E1D8-E1FC-40FD-AA9E-71B6E8852F96.jpeg

03.02.2020 23:09
0

I have no time at the moment, to follow tis posting but I guess, it will be a quite good source for my purposes too. Before I go on, I have to read your manual, dear @gtg. Thank you for this guide.

04.02.2020 00:33
0

모두의 희망이 되는 스팀을 만들어주기를 바랍니다. 당신이 많은 스티미언들부터 존경받는 증인이 되기를 바랍니다.

04.02.2020 06:21
0

💪💪

04.02.2020 07:03
0

Nice. Now you have my attention.

04.02.2020 11:23
0

Interesting. Offering custom APIs may be the way to go for techcoderx.com full node to stand out.

04.02.2020 14:24
0

04.02.2020 15:41
0

The Universe answered.
Your post is, like, godsend to me. I stuggled with an idea of starting to learn python for some time. With this Steem Pressure series you've conviced me.
Especially 7th episode.

04.02.2020 20:15
0

Thank you, I'm glad to hear that! :-)

04.02.2020 21:25
0

Nice write up. But not quite helpful to me though.

04.02.2020 21:22
0

Thank you. Hopefully it will help developers making Steem apps better, then all of us will benefit from it :-)

04.02.2020 22:11
0

PostgreSQL's JSON support seems to be worth exploring too.

04.02.2020 21:48
0

Definitely.
I was hoping to use something lightweight and simple first, thus checking RethinkDB, however, it seems I was too optimistic about its performance at scale (but I haven't checked RocksDB branch yet).

04.02.2020 22:05
0

Have you considered ElasticSearch?

05.02.2020 15:22
0

Yes, ElasticSearch will work and most importantly scale really well. If we are looking at query intensive workloads it works really well.

05.02.2020 19:11
0

I had some very scenarios with MongoDB clusters. HDFS/Hadoop though not db seems to the only new-gen NoSQLs that seems to hold up against real world workloads. I am not familiar with RethinkDB but its scalability will be a good item to consider. MySQL and PostgreSQL on the other hand has the tooling to work well with clusters and are time proven. With the JSON support they may be good for our dApp usecases.

Right now however ElasticSearch like @kaptainkrayola pointed out is a scalable solution. Chains like Bitshares and Peerplays have ElasticSearch plugins. I would consider Blockchain + ElasticSearch + demux-js ( https://github.com/EOSIO/demux-js ) implementation for the specific chain as a solution at this point.

PostgreSQL + PostREST (http://postgrest.org/en/v6.0/) would be something I would like to experiment with. I am fond of PostgreSQL as I have a project I inherited 11 years ago with Perl + PostgreSQL and we rewrote Perl to Python but the underlying PostgreSQL is running strong for close to 15 years now.

05.02.2020 19:10
0

This will be a good thing for steemit..

05.02.2020 14:24
0

Wow, someone else on the internet that likes RethinkDB and is suggesting using it for projects? That's awesome, I thought I was the only one!

05.02.2020 14:58
0

(Unless your favorite database is MongoDB, then use something else.)

ded

05.02.2020 15:45
0

This is simply amazing, the human ideas organized by crypto and now available for analyze

05.02.2020 16:13
0

this will be very good for steemit

06.02.2020 12:03
0

This is a fairly meaty post. It may require splitting into a few servings to fully digest. Thanks @gtg

P.S. using this comment for testing.
Test !CHOPS 4

07.02.2020 00:25
0

Hello!

This post has been manually curated, resteemed
and gifted with some virtually delicious cake
from the @helpiecake curation team!

Much love to you from all of us at @helpie!
Keep up the great work!

![helpiecake](https://cdn.steemitimages.com/DQmPXA56YrF9QSdEFkAMsrFJqS9ywEPgUMXfHZibu5WtVxH/Lemon-Mousse-Cake_2.png)

Manually curated by @solominer.


@helpie is a Community Witness.

07.02.2020 13:25
0

@gtg an anon stupid ballhead corrupt bully that probably got his money from corrupt fiat activities like the rest of the STEEM circle jerks!!!

Those circle jerks of steemit flag all my posts, even I was promoting STEEM IRL for 3 years... been using the blockchain almost 3 years everyday... I sold most of my steem and created LASSECASH as a bully free alternative to STEEM.

STEEM is the dumb pipeline to what LASSECASH is.... keep flagging my stupid suckers and ruin steem's price future... LASSECASH is a real positive social media on the blockchain where your corrupt money has no power... your flags are worthless as you have no LASSECASH POWER.

Over and out.

Lasse Ehlers

09.02.2020 13:50
0

It's just a disagreement on rewards.
I disagree to reward idiots and scammers with STEEM.
Both groups can still make use of anti-censorship features of Steem and post their content freely.
I just execute my right as SP holder to allocate Steem rewards pool to authors that brings value.

09.02.2020 14:07
0

BS.

2 cent in reward for sharing valuable information about HEX which is the absolutely best product in the whole crypto space, that is too much of reward?? HAHA.. While your BS post gets 150$?? HAHAHAH... You are so stupid, you and the rest of the circle jerks are the reason STEEM price is only 20 cent. You call me a scammer, which is false, you are the scammer since you flag objectively truth content. HEX is a certificate of deposit with a lot of liquidity and not a scam, you dumb fuck!!!

Fuck you, you have no power on LASSECASH!

09.02.2020 15:01
0

I try to be judicious w my joy emojis but this little exchange earned it. 😂

Thank you, @gtg. Let's not all of us rush to buy LASSECASH at one time now. Everybody form a single file line! Don't want anybody to get trampled with the insane demand for the LASSE token.
/s (as if the sarcasm was not obvious enough.)
@whatsup I think you may get some giggles here.

(LASSE token would have fared better if it were named after the dog imo)

!dramatoken

10.02.2020 04:20
0

You're upping the drama to new levels! Have a DRAMA.

To view or trade DRAMA go to steem-engine.com.

10.02.2020 04:21
0

https://ipfs.busy.org/ipfs/QmSeZ6cDEcAY54q5eVfrfiFminsb3vnBqhX4DpKZ51x1L3

Hi gtg, this write-up was particularly meaty, fulfilling, wholesome, beefy and deserving of a CHOPS token. Don't Chomp on it all at once!
To view or Trade CHOPS visit steem-engine.com

10.02.2020 04:12
0

Nice article.
MySQL? Try PostgreSQL, it's json / jsonb handling is really fast.

11.02.2020 20:04
0