I have been trying to create universal spam protection solution without using captcha and similar techniques and only thing that I came up with is Bayesian. I do know alot about Bayesian but not enough to get involved in such a project. I googled a bit and have found some very interesting articles about Bayesian inference but none of them (or most of them) did not address the problem from web developers perspective. I will try to summarize what I have learned about it and present possible way to take advantage of it in day by day web development.
Bayesian inference techniques have been widely used in developing various types of Artifical Intelligence (AI) systems (for instance for text retrieval, classification, medical diagnosis, data mining, troubleshooting, and more), so this article series will be of interest to anyone interested in building intelligent Web applications.
Conditional probability
Conditional probability as a term refers to concept of observing event A in light of conclusions we have gotten from observing event B.
P(A|B)
Looking from web developers perspective we can link conditional probability to Amazon product suggest. If we would look event A as book “JavaScript Bible” (most commonly used as object to prevent windows from closing, see omg office events) and event B as “DOM” book then P(A|B) would read “probability that a customer will buy book “JavaScript Bible” given that they have bought book “DOM”.”
More generally, if A and B systematically co-vary in some way, then P(A | B) will not be equal to P(A). Conversely, if A and B are independent events, then P(A | B) would be expected to equal P(A).
The need to compute a conditional probability thus arises any time you think the occurence of some event has a bearing on the probability of another event’s occurring.
The most basic and intuitive method for computing P(A | B) is the set enumeration method. Using this method, P(A | B) can be computed by counting the number of times A and B occur together {A & B} and dividing by the number of times B occurs {B}:
P(A | B) = {A & B} / {B}
If you observe that 12 customers to date bought product B and of those 12, 10 also bought product A, then P(A | B) would be estimated at 10/12 or 0.833. In other words, the probability of a customer buying product A given that they have purchased product B can be estimated at 83 percent by using a method that involves enumerating relative frequencies of A and B events from the data gathered to date.
PHP: Computing conditional probability using set enumeration
/**
* Returns conditional probability of $A given $B and $Data.
* $Data is an indexed array. Each element of the $Data array
* consists of an A measurement and B measurment on a sample
* item.
*/
function getConditionalProbabilty($A, $B, $Data) {
$NumAB = 0;
$NumB = 0;
$NumData = count($Data);
for ($i=0; $i < $NumData; $i++) {
if (in_array($B, $Data[$i])) {
$NumB++;
if (in_array($A, $Data[$i])) {
$NumAB++;
}
}
}
return $NumAB / $NumB;
}
To be continued...