by ailurophobe » 30 Jul 2011 22:53
Yes, obviously my description was nowhere near to being sufficient to get a filter that is actually usable...
Scenario A is simple enough to fix by simply not filtering files in the library. Your library would essentially work as a white list for files.
In general I was thinking along the lines of the Bayesian filters used for email filtering. So the filter would also need to count instances of "not associated with spam" for hashes and addresses, which would solve most of the problems. The reason I didn't mention this is because I honestly have no idea how to do this efficiently. I doubt it is impossible or even particularly difficult, I just happen to not know a good method right now.