Short: Multi-class SVM classifier with SMO Author: info@maskinemand.dk (Jesper Andersen) Uploader: info maskinemand dk (Jesper Andersen) Type: misc/sci Version: 1.0 Requires: 68000+, mathieeedoubbas.library, 1MB fast RAM Architecture: m68k-amigaos >= 2.0.0 --------------------------------------------------------------------------- SVMM - Support Vector Machine Multilevel Classifier for AmigaOS =============================================================== DESCRIPTION SVM is a command-line machine learning classifier based on the Support Vector Machine algorithm. It uses the Sequential Minimal Optimisation (SMO) training algorithm with a linear kernel and a precomputed kernel matrix cache for significantly faster training than a naive implementation. Both binary (two-class) and multi-class (one-vs-rest) classification are supported. A validation mode allows a labelled dataset to be scored against a trained model, producing per-class accuracy figures and a full confusion matrix. The code is written in strict C89 and compiles cleanly with the Storm C compiler using IEEE floating point. Three example training datasets are included covering two domains: Viking Age artefact classification (4 classes, 200 samples): Classify archaeological metal artefacts as weapons, jewellery, tools, or coins/hacksilver based on metal composition and physical measurements. Features grounded in published XRF analysis data from the Huxley Hoard and Ribe workshop excavations. Disease symptom classification (10 classes, 400 training + 200 validation samples): Classify patient symptoms into one of ten common diseases based on ten clinical symptom features. Features are pre-normalised to [0,1] and grounded in WHO clinical guidelines and CDC disease profiles. Included for demonstration purposes only -- this dataset must not be used for actual medical diagnosis. IMPORTANT MEDICAL DISCLAIMER The disease dataset and any model trained from it are provided for the sole purpose of demonstrating multi-class SVM classification. They must not be used to diagnose, treat, or inform medical decisions of any kind. Always consult a qualified medical professional. FEATURES - Binary classification (two-class problems) - Multi-class classification via one-vs-rest (up to MAX_CLASSES classes) - Validation mode with per-class accuracy and confusion matrix output - Precomputed kernel matrix cache for fast training - Incremental error cache to avoid redundant inner-loop recomputation - Plain-text progress output during training (every 10 iterations) - Reads training data from comma-separated (CSV) files - Saves and loads trained models to/from plain text files - Classifies a new sample from a comma-separated argument string - Accepts labels as 0/1 or -1/+1 interchangeably for binary models - Skips comment lines (starting with #) and blank lines in CSV files - Configurable regularisation, tolerance, and convergence parameters - No external dependencies beyond standard AmigaOS libraries REQUIREMENTS - AmigaOS 2.0 or higher - 68000 processor or better - mathieeedoubbas.library (included with AmigaOS) - At least 1MB fast RAM (for kernel matrix cache during training) Memory requirements during training (kernel cache): 200 samples -> 320 KB (fits on any expanded Amiga) 400 samples -> 1280 KB (requires 2MB+ fast RAM) USAGE Binary classification: svmm train svmm classify svmm validate Multi-class (one-vs-rest): svmm multitrain svmm multiclassify svmm multivalidate Examples (Viking artefacts): svmm multitrain viking_types_norm.csv viking.svm svmm multivalidate viking.svm viking_types_norm.csv svmm multiclassify viking.svm 0.015,0.898,0.086,0.016,0.152 Examples (diseases): svmm multitrain diseases_400.csv diseases.svm svmm multivalidate diseases.svm diseases_val.csv ; High fever, severe headache, stiff neck -> expect Meningitis (5) svmm multiclassify diseases.svm 0.9,0.95,0.1,0.1,1.0,0.5,0.6,0.0,0.3,1.0 ; Extreme diarrhoea, vomiting, no fever -> expect Cholera (7) svmm multiclassify diseases.svm 0.1,0.1,0.0,0.99,1.0,0.0,0.7,0.0,0.5,0.0 CSV TRAINING FILE FORMAT Each line contains one sample. The last column is the class label (integer). All other columns are numeric feature values. Columns are separated by commas. Whitespace around values is ignored. Lines beginning with # are treated as comments and skipped. Binary example (label must be 0 or 1): # temp_c, humidity_pct, wind_kmh, precip_mm, cloud_pct, label 12.0, 74.0, 16.0, 0.0, 35.0, 1 3.5, 87.0, 22.0, 5.2, 98.0, 0 Multi-class example (label is integer 0..N-1): # silver_pct, iron_pct, copper_pct, weight_g, length_mm, class 0.015, 0.898, 0.086, 0.016, 0.152, 0 0.861, 0.018, 0.105, 0.071, 0.140, 1 Important: features with very different numerical ranges must be normalised to the same scale before training. The linear kernel computes dot products, so large-valued features will dominate small ones and prevent convergence. The included datasets are all pre-normalised to [0,1]. VALIDATION OUTPUT The multivalidate command prints overall accuracy, per-class accuracy, and a full confusion matrix. Example output: Validation results ------------------ Samples tested : 200 Correct : 178 Incorrect : 22 Accuracy : 89.0% Per-class accuracy: Class 0 : 19 / 20 (95.0%) Class 5 : 20 / 20 (100.0%) Class 7 : 20 / 20 (100.0%) Confusion matrix (rows=actual, cols=predicted): Pred0 Pred1 Pred2 ... Act0 19 0 1 ... Act1 0 17 1 ... PERFORMANCE Classification of a single sample is under 1 second on all hardware. The kernel matrix cache precomputes all n*n dot products once before training begins, trading memory for a large reduction in floating point operations per iteration. This is the dominant optimisation. INCLUDED FILES svmm Main code viking_types_norm.csv Viking artefact training data (normalised) diseases_400.csv Disease training data (400 rows) diseases_val.csv Disease validation data (200 rows) svmm.readme This file DISEASE DATASET FEATURE REFERENCE All ten features are normalised 0.0 to 1.0. Column Type Description --------------- ---------- ------------------------------------------ fever_severity Continuous 0.0=none 0.6=moderate 1.0=high headache_sev Continuous 0.0=none 0.5=mild 1.0=severe cough_severity Continuous 0.0=none 0.5=mild 1.0=severe/chronic diarrhea_sev Continuous 0.0=none 0.5=mild 1.0=severe/watery vomiting Binary 0.0=absent 1.0=present rash Binary 0.0=absent 1.0=present fatigue_sev Continuous 0.0=none 0.5=mild 1.0=severe/chronic night_sweats Binary 0.0=absent 1.0=present muscle_pain Continuous 0.0=none 0.5=mild 1.0=severe stiff_neck Binary 0.0=absent 1.0=present Classes and key distinguishing features: 0 Malaria : High fever + muscle pain + night sweats, no cough 1 Pneumonia : Severe cough + moderate fever, no diarrhoea 2 HIV/AIDS : Chronic fatigue + night sweats, mild diarrhoea 3 Tuberculosis: Chronic cough + night sweats, low-grade fever 4 Diarrhoeal : Severe diarrhoea + vomiting, little or no fever 5 Meningitis : Stiff neck + severe headache + high fever 6 Typhoid : Sustained high fever + headache + rose-spot rash 7 Cholera : Extreme diarrhoea + vomiting + near-zero fever 8 Measles : Rash + cough + high fever 9 Dengue : Severe muscle pain + headache + sudden high fever VERSION HISTORY 1.0 Multi-class one-vs-rest support added Validation mode with confusion matrix Kernel matrix cache (major training speedup) Incremental error cache Disease and Viking artefact example datasets Fixed 68000/68020 struct offset overflow for large MAX_SAMPLES DISCLAIMER This software is provided as-is without warranty of any kind. The author accepts no responsibility for any loss or damage arising from its use. Use entirely at your own risk. The disease dataset is synthetic, generated for software demonstration purposes only, and must not be used for medical diagnosis or any clinical application whatsoever. ---------------------------------------------------------------------------