Mass-produced information to sway the opinions of the public has been with us at least since the advent of the printing press. But the advent of bot farms and AI-produced content has introduced a new variable in which it can be hard to know whether online comments have been written by paid actors or autonomous bots delivering a single overarching message to influence the opinion or even upcoming elections of a country.
To keep the subject of this post from pertaining to a certain country time and possibly going out of date, let's choose the fictional country of Klezskavania that forms the subject of this fantastic 1999 album from a band in Calgary called the Plaid Tongued Devils. Klezskavania is one of those places that you've never heard of and where everything that can go wrong, does. One of the lyrics from the album goes as follows:
But quite a lot of time has passed since 1999! We can imagine that Klezskavania has since overthrown its corrupt leadership and has also had an election or two. Things are starting to improve. But on the horizon is a new threat: it looks like a larger country has an interest in keeping Klezskavania subjugated and is using bot farms on Klezskavania's largest social network that pretend to be average citizens. They are stirring up unrest where there was none before. How can its citizens pinpoint bot-related activity while allowing human users to continue to write without restrictions as they have the right to?
Fortunately, SurrealDB has this sort of functionality built in, three of which will be introduced in this post. The first method we will look at involves using event-driven architecture to react to events the moment they happen.
Use an event to check if likes are happening too soon
A comment that receives a ton of likes within the first few minutes is clearly suspicious. Here is how we can use the schema itself to keep an eye on this behaviour.
This schema has two regular tables: user and comment. A user can write a comment or like a comment.
We can test this out a bit by creating some users and comments...
...followed by a graph query to see who wrote and liked what.
Output:
The above definitions also give us a nice schema too.

So far so good. Now we want to add the intelligent behaviour mentioned above so that the database can automatically detect suspiciously active likes the moment in which they are stored. To start, we'll add written_at fields to keep track of when a comment was written or liked, as well as a state on the likes table in which a suspicious like can be set as invalid.
This state field can be a simple bool, but it can also be a literal as shown below in which an optional context field can be added. Using a literal makes for more interesting output so let's go with that.
While we are at it, let's also define an index on likes so that only one like can happen between a user and a comment. This is often done on the app level, but thanks to this DEFINE INDEX statement we can achieve the same behaviour on the database level instead.
Since a likes is valid by default, we will need some logic to determine when a like should be invalid. A DEFINE EVENT statement will do the trick. This will keep track of any new likes records via the WHEN $event = "CREATE" clause. When a new likes shows up, it will compare the written_at field to the written_at field of the comment. If the like happens within a short time it will be set to invalid, as well as if a like comes in within a somewhat longer time by a user with low credibility.
To test this ourselves though we'll set the elapsed time to 3 seconds and 10 seconds. In real life this would certainly be somewhat longer (maybe something like 30 seconds and 5 minutes).
Now let's test this behaviour out by creating two normal users, a bot farm user that leaves comments specifically designed to lower the morale of everyday Klezskavanians, and a sketchy user that is hired by the bot farm to go around clicking like on these comments as soon as they are issued. In between these likes are two calls to sleep() to simulate the passage of time in the real world.
We can use this query to see the likes for each comment regardless of validity. This allows us to eyeball the likes to make sure that our event is working as intended.
Here is one example from a like that was done within 10 seconds but by the sketchy user, so it was marked as invalid.
And to pass on the number of valid likes to the app, just pass in a [WHERE state.valid] filter.
Now only a single like by user:sketchy has been allowed to get through, while comment:three and comment:four don't show any likes, keeping their visibility low.
Detect whether too many comments are in too small an area
By exact location
Sometimes the presence of a bot farm can be identified by the sudden influx of comments from a single location. SurrealDB has geometry types built in, which lets us benefit in a number of ways.
One is to simply group by location to see if any locations stand out as having a particularly large number of comments.
In the example below, we have comment records that have an ID composed of the time and location in which they were written. This format is used in record ranges, queries performed on just a slice of a table's records instead of the whole table.
Here are a few comments made using this ID format: three from random locations and three from a single location: a bot farm.
To do a search for abnormalities over the past day, we can use the range comment:[time::now() - 1d].., meaning any comments from a day ago.
The query shows that one location in particular has a suspiciously large number of comments!
By exact distance
Functions like geo::distance() can also be used instead of exact longitude and latitudes. Here are two points that we can see thanks to this function are 1321 metres away from each other:
Let's test this out with the same comments as above, but now with the bot-created comments close to each other instead of being at the exact same location.
The geo::distance() function can be put into a query like the one below that uses a subquery to see which of the day's comments are written in places suspiciously close to each other. The query is a bit complex so here are the main points to pay attention to when reading it:
(SELECT * FROM comment:[time::now() - 1]..)[WHERE neighbours]is the main query, which gets all the comments over the past day as long as there is something in theneighboursfield.The
neighboursfield will hold all the comments over the past day that are within 1000 metres of this comment.The subclause to find the
neighboursusesWHERE geo::distance(id[1], $parent.id[1]) < 1000 AND id != $parent.id: any comment that has a distance of less than 1000 metres and doesn't have the same ID as the current comment.And finally
author, geo::distance(id[1], $parent.id[1]) AS distance, contentinside the subquery so that the output is interesting and readable to us.
Here is the query!
As the output shows, there are a few comments that are suspiciously written close to each other.
Use recursive queries to see quality of a user's network
Bot farms aren't just composed of users pumping out comments and others liking them. Often they are more subtle, in which users a few steps away that seem innocuous like the posts of one user who likes the posts of another, eventually leading to the bot farm-created content.
But how can you go that far down a network to see what's going on? In SurrealDB it's easy: just use a recursive graph query.
Before we begin the recursive queries, let's take a look at the situation in this example network. We have some good users, as well as a user that has a pretty high botness score: a number between 0 and 100 that the system uses to determine if a user exhibits bot-like behaviour.
Via the friends field we can see who has befriended the bot, but we wouldn't want to silence users just for that - after all, the point of a bot is to trick others into thinking they are real people. Plus, the networks we mentioned use a lot of indirect routes to set up a network of bot content.
Here are three more users to show how this works. The first one has had a few strikes against it but still has a fairly low botness score, the next friend has a higher score of 40, and finally one with a score of 50 is direct friends with the bot.
So instead of trying to determine one by one whether a user is indirectly part of the network or not, we can create an overall quality score to recalculate a user's botness. This is where recursive queries come in.
Here is a simple recursive query to get started. The @.{3+collect}.friends part means the following:
@.Starting from this record{3+collect}Go down three levels, collecting all the record IDs as you go.friendsby following thefriendspath.
Eyeballing the output is enough to see who has any connections, direct or indirect, to the bot.
The recursive syntax also just be used with a single number like {2} if you want the query to go down to an exact depth. The query below uses this with @.{2}.friends and @.{3}.friends to tell the database to go that many steps down the friends path and see what's there at that point.
Here is one of the users in the output, who has a bot friend at the third level.
We can use this pattern to determine a recalculation of a user's botness to now include quality of network. We will determine this by doing the following:
Get the average botness of a user's direct friends, multiply this by 0.5
Same for the second level, multiply this by 0.3
Again for the third level, multiply this by 0.2
Add them all together.
This allows some small consequences for a user's overall greater network, while still ensuring that a user's direct friends have the greatest effect.
The query looks like this:
As the output shows, user:suspicious (the one who pretends to be entirely trustworthy and not related to any bot farms) now has a greater_botness score of 40, twice that of the original botness score of 20. That allows us to see that this user might be working as a front for an influence network, attempting to use its relatively high trustworthiness to influence posts and comments that are indirectly tied to the influence network.
This post has only touched on a few of the ways SurrealDB can be used to combat organised influence campaigns, so another will follow it in the near future. See you in the next one!
