---------------------------------------------------------------------------
SVMM - Support Vector Machine Multilevel Classifier for AmigaOS
===============================================================
DESCRIPTION
SVM is a command-line machine learning classifier based on the Support
Vector Machine algorithm. It uses the Sequential Minimal Optimisation
(SMO) training algorithm with a linear kernel and a precomputed kernel
matrix cache for significantly faster training than a naive implementation.
Both binary (two-class) and multi-class (one-vs-rest) classification are
supported. A validation mode allows a labelled dataset to be scored against
a trained model, producing per-class accuracy figures and a full confusion
matrix.
The code is written in strict C89 and compiles cleanly with the Storm C
compiler using IEEE floating point.
Three example training datasets are included covering two domains:
Viking Age artefact classification (4 classes, 200 samples):
Classify archaeological metal artefacts as weapons, jewellery,
tools, or coins/hacksilver based on metal composition and physical
measurements. Features grounded in published XRF analysis data from
the Huxley Hoard and Ribe workshop excavations.
Disease symptom classification (10 classes, 400 training +
200 validation samples):
Classify patient symptoms into one of ten common diseases
based on ten clinical symptom features. Features are pre-normalised
to [0,1] and grounded in WHO clinical guidelines and CDC
disease profiles. Included for demonstration purposes only -- this
dataset must not be used for actual medical diagnosis.
IMPORTANT MEDICAL DISCLAIMER
The disease dataset and any model trained from it are provided
for the sole purpose of demonstrating multi-class SVM classification.
They must not be used to diagnose, treat, or inform medical decisions
of any kind. Always consult a qualified medical professional.
FEATURES
- Binary classification (two-class problems)
- Multi-class classification via one-vs-rest (up to MAX_CLASSES classes)
- Validation mode with per-class accuracy and confusion matrix output
- Precomputed kernel matrix cache for fast training
- Incremental error cache to avoid redundant inner-loop recomputation
- Plain-text progress output during training (every 10 iterations)
- Reads training data from comma-separated (CSV) files
- Saves and loads trained models to/from plain text files
- Classifies a new sample from a comma-separated argument string
- Accepts labels as 0/1 or -1/+1 interchangeably for binary models
- Skips comment lines (starting with #) and blank lines in CSV files
- Configurable regularisation, tolerance, and convergence parameters
- No external dependencies beyond standard AmigaOS libraries
REQUIREMENTS
- AmigaOS 2.0 or higher
- 68000 processor or better
- mathieeedoubbas.library (included with AmigaOS)
- At least 1MB fast RAM (for kernel matrix cache during training)
Memory requirements during training (kernel cache):
200 samples -> 320 KB (fits on any expanded Amiga)
400 samples -> 1280 KB (requires 2MB+ fast RAM)
USAGE
Binary classification:
svmm train <data.csv> <model.svm>
svmm classify <model.svm> <x1,x2,...,xn>
svmm validate <model.svm> <data.csv>
Multi-class (one-vs-rest):
svmm multitrain <data.csv> <model.svm>
svmm multiclassify <model.svm> <x1,x2,...,xn>
svmm multivalidate <model.svm> <data.csv>
Examples (Viking artefacts):
svmm multitrain viking_types_norm.csv viking.svm
svmm multivalidate viking.svm viking_types_norm.csv
svmm multiclassify viking.svm 0.015,0.898,0.086,0.016,0.152
Examples (diseases):
svmm multitrain diseases_400.csv diseases.svm
svmm multivalidate diseases.svm diseases_val.csv
; High fever, severe headache, stiff neck -> expect Meningitis (5)
svmm multiclassify diseases.svm 0.9,0.95,0.1,0.1,1.0,0.5,0.6,0.0,0.3,1.0
; Extreme diarrhoea, vomiting, no fever -> expect Cholera (7)
svmm multiclassify diseases.svm 0.1,0.1,0.0,0.99,1.0,0.0,0.7,0.0,0.5,0.0
CSV TRAINING FILE FORMAT
Each line contains one sample. The last column is the class label
(integer). All other columns are numeric feature values. Columns are
separated by commas. Whitespace around values is ignored. Lines
beginning with # are treated as comments and skipped.
Binary example (label must be 0 or 1):
# temp_c, humidity_pct, wind_kmh, precip_mm, cloud_pct, label
12.0, 74.0, 16.0, 0.0, 35.0, 1
3.5, 87.0, 22.0, 5.2, 98.0, 0
Multi-class example (label is integer 0..N-1):
# silver_pct, iron_pct, copper_pct, weight_g, length_mm, class
0.015, 0.898, 0.086, 0.016, 0.152, 0
0.861, 0.018, 0.105, 0.071, 0.140, 1
Important: features with very different numerical ranges must be
normalised to the same scale before training. The linear kernel
computes dot products, so large-valued features will dominate small
ones and prevent convergence. The included datasets are all
pre-normalised to [0,1].
VALIDATION OUTPUT
The multivalidate command prints overall accuracy, per-class accuracy,
and a full confusion matrix. Example output:
Validation results
------------------
Samples tested : 200
Correct : 178
Incorrect : 22
Accuracy : 89.0%
Per-class accuracy:
Class 0 : 19 / 20 (95.0%)
Class 5 : 20 / 20 (100.0%)
Class 7 : 20 / 20 (100.0%)
Confusion matrix (rows=actual, cols=predicted):
Pred0 Pred1 Pred2 ...
Act0 19 0 1 ...
Act1 0 17 1 ...
PERFORMANCE
Classification of a single sample is under 1 second on all hardware.
The kernel matrix cache precomputes all n*n dot products once before
training begins, trading memory for a large reduction in floating point
operations per iteration. This is the dominant optimisation.
INCLUDED FILES
svmm Main code
viking_types_norm.csv Viking artefact training data (normalised)
diseases_400.csv Disease training data (400 rows)
diseases_val.csv Disease validation data (200 rows)
svmm.readme This file
DISEASE DATASET FEATURE REFERENCE
All ten features are normalised 0.0 to 1.0.
Column Type Description
--------------- ---------- ------------------------------------------
fever_severity Continuous 0.0=none 0.6=moderate 1.0=high
headache_sev Continuous 0.0=none 0.5=mild 1.0=severe
cough_severity Continuous 0.0=none 0.5=mild 1.0=severe/chronic
diarrhea_sev Continuous 0.0=none 0.5=mild 1.0=severe/watery
vomiting Binary 0.0=absent 1.0=present
rash Binary 0.0=absent 1.0=present
fatigue_sev Continuous 0.0=none 0.5=mild 1.0=severe/chronic
night_sweats Binary 0.0=absent 1.0=present
muscle_pain Continuous 0.0=none 0.5=mild 1.0=severe
stiff_neck Binary 0.0=absent 1.0=present
Classes and key distinguishing features:
0 Malaria : High fever + muscle pain + night sweats, no cough
1 Pneumonia : Severe cough + moderate fever, no diarrhoea
2 HIV/AIDS : Chronic fatigue + night sweats, mild diarrhoea
3 Tuberculosis: Chronic cough + night sweats, low-grade fever
4 Diarrhoeal : Severe diarrhoea + vomiting, little or no fever
5 Meningitis : Stiff neck + severe headache + high fever
6 Typhoid : Sustained high fever + headache + rose-spot rash
7 Cholera : Extreme diarrhoea + vomiting + near-zero fever
8 Measles : Rash + cough + high fever
9 Dengue : Severe muscle pain + headache + sudden high fever
VERSION HISTORY
1.0 Multi-class one-vs-rest support added
Validation mode with confusion matrix
Kernel matrix cache (major training speedup)
Incremental error cache
Disease and Viking artefact example datasets
Fixed 68000/68020 struct offset overflow for large MAX_SAMPLES
DISCLAIMER
This software is provided as-is without warranty of any kind. The author
accepts no responsibility for any loss or damage arising from its use.
Use entirely at your own risk.
The disease dataset is synthetic, generated for software
demonstration purposes only, and must not be used for medical diagnosis
or any clinical application whatsoever.
---------------------------------------------------------------------------
|