Spam filtration

Get answers to your Shareaza related problems.
Forum rules
Home | Wiki | Rules

Spam filtration

Postby grey-hame » 10 Dec 2009 16:20

I had a thought regarding spam filtration.

It came when I noticed a recurrent pattern in spam results: they combine your query words but a) often rearrange the order and b) if you search on part of a title, words you expect from the rest of the title in a genuine hit are absent.

This suggests a couple of methods of weeding out spam. One is a filter option to ignore results that rearrange the query words. My thought was that you could search for "X Y" Z and if you got back X Y Z, fine, or Z X Y, but Y X Z would be ignored. Quotation marks would indicate a query phrase that should not get rearranged in any legitimate result.

The second method would add a field to the query box for a search term that wouldn't be broadcast, but if non-blank results lacking that word would be filtered. So if you're looking for "The X that Y'd in the Z" and Y is sufficiently unusual that anything with X and Y in the title that isn't spam is probably the file you're looking for, you could put X and Y in the main search box and Z in this filter-box and the spammy results like "X Y the new top single", inevitably lacking Z, would not be shown, or counted against when it stops searching, whereas "The X that Y'd in the Z" would show up when it was found, and would not be lost in a huge volume of spammy clutter.
grey-hame
 
Posts: 189
Joined: 08 Aug 2009 19:47

Re: Spam filtration

Postby cyko_01 » 10 Dec 2009 23:47

User avatar
cyko_01
 
Posts: 938
Joined: 13 Jun 2009 15:51


Return to Help and Support

Who is online

Users browsing this forum: Google [Bot] and 1 guest

cron