Knowledge Base | Keyword Cupid

## F.A.Q

• These colors represent the concept of cluster confidence/keyword co-occurence.
This metric indicates our degree of certainty that the keywords inside each silo belong together.
Sometimes all of the keywords inside the silo are closely related to each other, which yields a high cluster confidence.
We color anything above 90% confidence as green.
Between 80%-90% is colored orange and below 80% is red.
In the case of red clusters, we just want to alert the user to pay attention when interlinking the subsequent pages.

• The clusters with lower cluster confidence are clusters that KC flags as "loosely" related.
The user needs to pay attention to that node's children and determine if any subtle underlying intents form this relationship.
The keywords might not overlap in SERPs, but they still might be related in general terms due to transitive relationships.

The page clusters should almost always be green as the rough neural network we use penalizes the estimators' weights in a great deal. This way, we are looking at the most thematically relevant pages we can create.
When linking together the pages to silos, we want to be more "relaxed" and find subtle relationships that we might have missed. For that reason we use a recurrent neural network Keyword Cupid outputs a very "harsh" and "rigid" plan, so it doesn't have to be perfect to be effective.
Even if you follow the content plan we provide in general terms, you will still unravel more associations than organizing your keywords manually.

• In order to provide the most accurate results, we have chosen to rely only on real time data.
We use half a dozen APIs and data providers to capture an accurate picture of the landscape of your project.
Therefore, a report with 100 keywords is bound to complete sooner than a report with 2500 keywords.
Another contributing factor to your wait time is the computational complexity of the models we use.
A report with 2,000 keywords analyzes 2000 kws * 100 Serps = 200,000 combinations. All of these combinations require examination to create accurate hierarchical structures and group relevant keywords together.

• Fear not, irrespective of how large your target report was, we automatically replenish the keyword credits to your balance in case of a failed report.
You only pay for results :)

## "Matchmaking Process"

The Manual Way

There are so many tools that approach keyword matchmaking in a variety of ways.
In order to show you what makes us different, we aim to provide basic intuition on the algorithmic complexity of the combinatorial problems Keyword Cupid solves.
This excerpt is not meant to be an in-depth study on the implementation of the underlying machine learning models we use on our platform.

Let's assume you want to include two target keywords in your content strategy: $$kw_{1} , kw_{2}$$
Also assume that in order to derive relevancy and proximity, the user only relies upon the first 3 pages of Google to unravel overlaping and matching results.
The respective search engine ranking positions (SERPs), for these two keywords, are :

$$\begin{bmatrix} serp_{1,1} & serp_{1,2} & \cdots & serp_{1,30} \\ serp_{2,1} & serp_{2,2} & \cdots & serp_{2,30} \\ \end{bmatrix}$$ $$\forall serp_{i,\ j} \ ,\ i\in \{1,2\} ,\ j\in \{1,2...,30\}\\$$
In order to derive vector similarity, we need to measure the overlap of the underlying data points. $$\sum\limits _{i}\sum\limits _{j} serp_{1,\ i} \equiv serp_{2,\ j} ,\ \\ i,j\in \{1,2...,30\}$$
In other words, for each SERP result of kw1, we compare it against each SERP result of kw2 to determine if they are the same.
We can easily derive that, for our example scenario of 30 SERPs, the outcome of this operation is 900 comparisons.

Now that you understand this simple example, let's use a more practical use case.
In a real world scenario, our target keyword dataset is hopefully composed of more than 2 keywords.
Let's just take an average case of 1,000 keywords that we need to group.
We need to find the number of pair-wise combinations between all of these 1,000 elements.
The formula that will give us this number is:
$$𝐶(𝑛,𝑟) =\begin{pmatrix}n\\ r \end{pmatrix} =\frac{n!}{( 𝑟!\ ( 𝑛−𝑟) !)} \ =\ \frac{1000!}{( 2!\ ( 1000−2) !)} \ =\ 499,500$$

Thus, we have 449,500 pairs.
For each pair, there are 900 manual comparisons.
Therefore we have a total of 449,500 * 900 = 404,550,000 actions to complete.
This is for the use case of comparing the top 30 SERP results for 1,000 keywords.

Keyword Cupid compares the first 100 SERP results, using an exponentially weighted curve, for up to 40,000 keywords.
The equivalent actions needed following a brute force solution would be:
$$𝐶( 40000,2) \ *\ \sum\limits ^{100}_{i}\sum\limits ^{100}_{j} serp_{1,\ i} \equiv serp_{2,\ j} \ \Longrightarrow \\ 7,998,000\ *\ 10,000\ =\ 79,980,000,000$$
80 trillion manual actions could take a while...
Maybe you can give Keyword Cupid a try...