Page 1 of 1
more intelligent suspicious files filter

Posted:
25 Jul 2011 06:16
by cyko_01
Re: more intelligent suspicious files filter

Posted:
25 Jul 2011 20:16
by old_death
You're right. Such factors could be used to calculate a "spam probability rating" with a maximum of 100 points out of all available heuristic factors. There could be a setting somewhere to allow users to turn the filter threshold...
Re: more intelligent suspicious files filter

Posted:
28 Jul 2011 17:40
by cyko_01
Re: more intelligent suspicious files filter

Posted:
29 Jul 2011 01:05
by ailurophobe
One method would be to separate security filtering and spam filtering. Security filtering would block addresses outright and essentially be an IP filter on connections. Spam filter would work on search results only and be adaptive. So that if a keyword filter blocks a result both the address and the hashes would be marked as probable spam. A hash block would count against the address. Address block against the hash. This way starting from known spam the filter could learn to filter new spam from its association with known spam.
For example if an address is a source of spam matching a reg exp filter, a learning filter would remember the address and filter results from it even after the spam no longer matches the reg exp filter. Similarly it would remember the hashes for the reg exp hits and filter those files even if they came from clean addresses with new names. With a smart enough threshold system it would even be possible to use these new addresses and hashes to learn even more addresses and hashes to filter.
Re: more intelligent suspicious files filter

Posted:
29 Jul 2011 13:10
by cyko_01
Re: more intelligent suspicious files filter

Posted:
29 Jul 2011 17:09
by ailurophobe
I am pretty sure that a learning spam filter would count as being "more intelligent"...
Seriously I am not a big fan of the "one-click black box" spam filter. I've recently had some cases where I've needed to uncheck spam filtering to see the perfectly valid results to my query. Not a big or common issue, but if the amount of filtering is increased the number of false positives will go up. Basically there is a limit on how many things you can filter automatically before it becomes inconvenient for users not to have finer control over the process. I doubt we are there yet or even after your proposals, but it is something to keep in mind. At some point making structural changes will become a better option.
Re: more intelligent suspicious files filter

Posted:
29 Jul 2011 18:43
by cyko_01
Re: more intelligent suspicious files filter

Posted:
30 Jul 2011 22:53
by ailurophobe
Yes, obviously my description was nowhere near to being sufficient to get a filter that is actually usable...
Scenario A is simple enough to fix by simply not filtering files in the library. Your library would essentially work as a white list for files.
In general I was thinking along the lines of the Bayesian filters used for email filtering. So the filter would also need to count instances of "not associated with spam" for hashes and addresses, which would solve most of the problems. The reason I didn't mention this is because I honestly have no idea how to do this efficiently. I doubt it is impossible or even particularly difficult, I just happen to not know a good method right now.
Re: more intelligent suspicious files filter

Posted:
31 Jul 2011 12:16
by borsti67
Regarding not-yet-deleted spam files: you could count the number of "known bad files" you've got in response to search requests for each session? This would give a penalty for that host, increasing the spam-likelyhood of search results from there and ending up in blocking it after a certain count.
Re: more intelligent suspicious files filter

Posted:
01 Aug 2011 02:56
by cyko_01
Re: more intelligent suspicious files filter

Posted:
24 Aug 2011 22:42
by smokex
if (result is type that should have metadata)
{
if (result does not have metadata)
{
drop search hit;
}
}
not complicated for this one and would be damn simple to implement and take care of 90% of the spam out there.
Re: more intelligent suspicious files filter

Posted:
25 Aug 2011 01:25
by cyko_01
Re: more intelligent suspicious files filter

Posted:
03 Sep 2011 12:10
by old_death
Are G2 clients supposed to provide metadata? I mean, does the specs say this is necessary or only optional?
Re: more intelligent suspicious files filter

Posted:
04 Sep 2011 12:39
by brov
Optional.
Re: more intelligent suspicious files filter

Posted:
05 Sep 2011 15:26
by cyko_01
Re: more intelligent suspicious files filter

Posted:
05 Sep 2011 16:16
by biggestnoob
I actually find very little spam with G2, DC++ and mule files. I mean 1 spam file out of 100.
The most spam I find is with G1, but even that doesn't bother me as its usually .mov, .qt or .wmv files and very little are actually .mp3 and .mp4
So I don't see a big reason for additional filters or stuff. I'd just like DC++ to function better and return more results.
Re: more intelligent suspicious files filter

Posted:
09 Sep 2011 07:18
by old_death
Well, then you are a lucky exception... Spam is a much bigger problem for most other G2 network users...