more intelligent suspicious files filter

Discuss Shareaza development as a user.
Forum rules
Home | Wiki | Rules

more intelligent suspicious files filter

Postby cyko_01 » 25 Jul 2011 06:16

User avatar
cyko_01
 
Posts: 938
Joined: 13 Jun 2009 15:51

Re: more intelligent suspicious files filter

Postby old_death » 25 Jul 2011 20:16

You're right. Such factors could be used to calculate a "spam probability rating" with a maximum of 100 points out of all available heuristic factors. There could be a setting somewhere to allow users to turn the filter threshold...
User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: more intelligent suspicious files filter

Postby cyko_01 » 28 Jul 2011 17:40

User avatar
cyko_01
 
Posts: 938
Joined: 13 Jun 2009 15:51

Re: more intelligent suspicious files filter

Postby ailurophobe » 29 Jul 2011 01:05

One method would be to separate security filtering and spam filtering. Security filtering would block addresses outright and essentially be an IP filter on connections. Spam filter would work on search results only and be adaptive. So that if a keyword filter blocks a result both the address and the hashes would be marked as probable spam. A hash block would count against the address. Address block against the hash. This way starting from known spam the filter could learn to filter new spam from its association with known spam.

For example if an address is a source of spam matching a reg exp filter, a learning filter would remember the address and filter results from it even after the spam no longer matches the reg exp filter. Similarly it would remember the hashes for the reg exp hits and filter those files even if they came from clean addresses with new names. With a smart enough threshold system it would even be possible to use these new addresses and hashes to learn even more addresses and hashes to filter.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: more intelligent suspicious files filter

Postby cyko_01 » 29 Jul 2011 13:10

User avatar
cyko_01
 
Posts: 938
Joined: 13 Jun 2009 15:51

Re: more intelligent suspicious files filter

Postby ailurophobe » 29 Jul 2011 17:09

I am pretty sure that a learning spam filter would count as being "more intelligent"...

Seriously I am not a big fan of the "one-click black box" spam filter. I've recently had some cases where I've needed to uncheck spam filtering to see the perfectly valid results to my query. Not a big or common issue, but if the amount of filtering is increased the number of false positives will go up. Basically there is a limit on how many things you can filter automatically before it becomes inconvenient for users not to have finer control over the process. I doubt we are there yet or even after your proposals, but it is something to keep in mind. At some point making structural changes will become a better option.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: more intelligent suspicious files filter

Postby cyko_01 » 29 Jul 2011 18:43

User avatar
cyko_01
 
Posts: 938
Joined: 13 Jun 2009 15:51

Re: more intelligent suspicious files filter

Postby ailurophobe » 30 Jul 2011 22:53

Yes, obviously my description was nowhere near to being sufficient to get a filter that is actually usable...

Scenario A is simple enough to fix by simply not filtering files in the library. Your library would essentially work as a white list for files.

In general I was thinking along the lines of the Bayesian filters used for email filtering. So the filter would also need to count instances of "not associated with spam" for hashes and addresses, which would solve most of the problems. The reason I didn't mention this is because I honestly have no idea how to do this efficiently. I doubt it is impossible or even particularly difficult, I just happen to not know a good method right now.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: more intelligent suspicious files filter

Postby borsti67 » 31 Jul 2011 12:16

Regarding not-yet-deleted spam files: you could count the number of "known bad files" you've got in response to search requests for each session? This would give a penalty for that host, increasing the spam-likelyhood of search results from there and ending up in blocking it after a certain count.
borsti67
 
Posts: 8
Joined: 29 Dec 2009 22:23

Re: more intelligent suspicious files filter

Postby cyko_01 » 01 Aug 2011 02:56

User avatar
cyko_01
 
Posts: 938
Joined: 13 Jun 2009 15:51

Re: more intelligent suspicious files filter

Postby smokex » 24 Aug 2011 22:42

if (result is type that should have metadata)
{
if (result does not have metadata)
{
drop search hit;
}
}

not complicated for this one and would be damn simple to implement and take care of 90% of the spam out there.
User avatar
smokex
 
Posts: 46
Joined: 13 Jun 2009 19:17

Re: more intelligent suspicious files filter

Postby cyko_01 » 25 Aug 2011 01:25

User avatar
cyko_01
 
Posts: 938
Joined: 13 Jun 2009 15:51

Re: more intelligent suspicious files filter

Postby old_death » 03 Sep 2011 12:10

Are G2 clients supposed to provide metadata? I mean, does the specs say this is necessary or only optional?
User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: more intelligent suspicious files filter

Postby brov » 04 Sep 2011 12:39

Optional.
brov
 
Posts: 87
Joined: 05 Jul 2009 12:15

Re: more intelligent suspicious files filter

Postby cyko_01 » 05 Sep 2011 15:26

User avatar
cyko_01
 
Posts: 938
Joined: 13 Jun 2009 15:51

Re: more intelligent suspicious files filter

Postby biggestnoob » 05 Sep 2011 16:16

I actually find very little spam with G2, DC++ and mule files. I mean 1 spam file out of 100.

The most spam I find is with G1, but even that doesn't bother me as its usually .mov, .qt or .wmv files and very little are actually .mp3 and .mp4

So I don't see a big reason for additional filters or stuff. I'd just like DC++ to function better and return more results.
biggestnoob
 
Posts: 8
Joined: 26 Mar 2011 18:55

Re: more intelligent suspicious files filter

Postby old_death » 09 Sep 2011 07:18

Well, then you are a lucky exception... Spam is a much bigger problem for most other G2 network users...
User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19


Return to Bugs, Tasks, and Features Discussion

Who is online

Users browsing this forum: No registered users and 1 guest