The following provides an overview of a basic
test of the Mailshell SDK's throughput, accuracy and message
latency. For suggestions and details on more comprehensive
tests, please contact your Mailshell account manager.
Before you begin testing, we strongly recommend
reading the SDK FAQ.
- Create a directory of
a wide variety of email messages saved with full headers
as individual files. Be careful not to alter the messages.
Example spam messages are provided in spamcatchersdk/examples/msgs/spam/
Create a directory of a wide variety of
spam messages (e.g. spam) and another directory of a wide
variety of legit messages (e.g. legit). Save messages
with full headers as individual files. Be careful not
to alter the messages, headers or body, in any way.
Here are some various ways to gather messages:
| Source |
Description |
| Mailshell Corpus |
http://www.mailshell.com/publiccorpus |
| SpamAssassin Corpus |
http://spamassassin.org/publiccorpus |
| SpamArchive Corpus |
http://www.spamarchive.org |
| Other Corpora |
http://www.paulgraham.com/spamarchives.html |
| Your corporate mail system |
Copy message files from your mail store. |
| Use fetchmail to extract IMAP/POP account |
http://catb.org/~esr/fetchmail/ with
-mda option |
| Setup mail server to save messages
to files |
http://emailrelay.sourceforge.net |
-
Install the Mailshell SDK. Installation
instructions are available here.
-
Obtain a Mailshell SDK license from your
Mailshell account manager and enter your license key
in the spamcatcher.conf configuration file.
-
Accuracy:
Process the messages with the threaded example program
with the default settings. Each message will receive
a spam score from 0-100, indicating the probability
that the message is spam.
examples/threadtest/threadtest
-t 1 -D conf examples/msgs/spam/
-
Performance:
You can approximate real-time performance
by collecting a sample of data (e.g. 10,000 messages,
1 hour of messages, every 100th message for a week, or
just a random sample) on your live mail server for a typical
period. You can then compute the worse case scenario by
running the threaded threadtest program with the number
of threads set to one, which assumes that all of the messages
are delivered serially. You can compute the best-case
scenario by running the threaded threadtest program with
the threads set to the number of processors on your machine,
which assumes that your processors are at full utilization.