Pornography Blocking Algorithm [Auto-Detection, Site Blocking, Zero-Day protection]
I personally think that detecting a porn site is incredibly easy, both for humans and computers.
I think OpenDNS should create a web crawler (en.m.wikipedia.org/wiki/Web_crawler) that counts the number of "bad" keywords (words to describe a porn video in its title for example) on a web page (or if possible the entire site), and if the crawler produces a count over a particular threshold than it should automatically be blocked. If the threshold isnt that significant... It should still be blocked, however in the case that it isn't very significant openDNS community members should be given the opertunity to vote and decide.
^ This process would be far quicker and would provide families with zero-day protection.
This crawler idea will not produce flase positives because no site on the world wide web would contain 100 counts of the keyword ******** (for example) unless it was a porn site.
This crawler idea will work because compared to all the other categories games, weapons etc. Porn would be the easiest to filter out using a keyword count algorithm.
-
"OpenDNS should create a web crawler"
This is really much efforts and cost. Who should pay for it if you think about the free service? Would you like to become a sponsor?
"Porn would be the easiest to filter out using a keyword count algorithm."
This is what you think. I would like to disagree. There are many, many porn sites where you do not find much related keywords as plain text, and there are many kind of anti-porn sites using these terms as part of documentation. Crawling is often very unreliable and leads to bad results, as can be seen with other related appliances.
-
Well, you're certainly welcome to your opinion, but that doesn't mean it's right.
The kind of web crawling you describe is useful for search engines, but since it only counts instances of words, and then based on an arbitrary number (is 10 too many? how about 1000?) determines whether a site is pornographic or not, does not mean it is actually *detecting* anything accurately or not. It doesn't actually analyze the *content* of a page to classify it. Words could indicate that a sight is pornographic, but those same words could also be an anti-pornography site of some sort, or it could be an academic site that is trying to do an analysis of internet content. It could also be a site about roosters, and for whatever reason the content creator prefers to use the perfectly accurate word cock. What about filtering that only keys in on partial words, such as "cock" in "cocky"? Or a website about canine breeding or genealogy, where the equally accurate word bitch is more than appropriate. Need I go on about the word breast or many other words that have multiple uses in our modern language, slang, and vernacular and have long been the bane of word based filtering lists because simple word filtering does not take into account content or context?
While this process could quicky "classify" many domains as pornographics, it would equally as quickly misclassify many, many domains, which then requires even more work to identify and correct.
There's good reason that most IT and security people shy away from word based filtering for such things as virus detection, spam filtering, or website content classification. Where it's used it's almost always one of several inputs, has a lesser weight than all or most of them, and when it's the primary determinant of something almost always requires human review and approval to classify the overall content and context of those words.
Most of the time when people declare that something is easy to do they have no relevant experience in the field. I'm curious, do you have any IT experience, let alone programming, database, or web programming experience? Are you participating in OpenDNS community tagging since this seems so important to you, or are you sitting back and letting others do all the work?
-
okay point taken "mattwilson9090", i understand that using technology to produce web cralwers/other algoritms to categorise websites is pushing it.
However, the current way in which this comunity works is slow and doesn't attract much attention (in terms of votes etc), an openDNS comunity memeber must login, enter the domain name, select a cateogry for that domain, and then click submit.
What if... and this is only a suggestion, use the openDNS members as crawlers (human crawlers)?. Extensions for all the major web browsers could be provided allowing members to flag a site with a simple click, and when another user visits the same siie, they are able to see all the on-going votes happening for that particular site etc. This would motivate members to vote more frequently due to it being easy and quick. It will also be alot quicker than what the current process has to offer. It is a cost effective method that works. Human Crawlers (you cant go wrong). You may however bring up the argument that people may abuse this method because its extremly easy to flag sites, i would disagree with this argument because the only reasons an openDNS member may choose to install this extension is to make the content filtering more comprehensive both for personal gain (protecting thier own families from inapproprioate content) and for the benefit of other parent etc.
-
There is already an extension for a browser, I believe it's FireFox. Someone recently posted to another thread about it so you could search for it.
Assuming that the plug-in actually identifies domains properly, such as the main domain of the website you are on, and not a supporting domain used by the website, or even the domain of an advertisement on the site doesn't mean people will accurately tag domains. Many people seem to almost randomly tag domains, or if they don't like a website for any reason (for instance, such as not agreeing with it's political stance) will tag it as pornography in an attempt to get it blocked or even taken off of the internet because of that.
Be aware, filtering pornography is not the only reason people use OpenDNS, and there are some users who don't block that and related categories because they have other things they are concerned about. Just as there are many reasons why someone might use OpenDNS, there are many reasons someone might properly or improperly flag a domain.
However, manually searching for a domain name is not the only way to tag a domain. If you go to the domain tagging portion of the community portion of the website you can vote for domains that have already been tagged and are awaiting enough votes for approval. You can go through them randomly or I believe you can go through them by category.
Please sign in to leave a comment.
Comments
6 comments