Exam Candidate Verification Through Handwritten Artifacts
Updated on 21st March 2025
A hierarchical voting system for verifying exam candidates using signatures and handwriting.
Our Team

Team Hikari Research, From left: Haritha Hasathcharu, Supervisor: Dr. Upeksha Ganegoda, Supervisor: Ms. Shalini Upeksha, Rashmi Sandamini, and Sagini Navaratnam
Introduction

Exam Candidate Verification System Landing Page
This research project was carried out by our team, Hikari Research, as part of our final year in the BSc. (Hons) in Information Technology and Management degree at the University of Moratuwa. This article briefly outlines the project.
We use biometrics to verify the identify of individuals in various applications. There exists many physiological biometric systems such as fingerprint, face, iris, etc. However, there is another type of biometrics known as behavioral biometrics, which are based on the behavior of individuals. For example, key-stroke dynamics, gait, and more interestingly, handwriting can be used to identify humans. We were so fascinated by the idea of verifying a candidate for an exam using their handwriting, as handwriting is certainly unique to a person, as well as it is available in abundance in an exam setting. The idea that the answer script itself can be used to verify the candidate is very interesting, as if by chance that any other verification method is compromised, the answer script can still be used to verify the candidate, even after the exam. This is the main motivation to carry out this research.
Based on our literature survey, we found that there are many remarkable works done in this area, but faced challenges when it came to the practical implementation of these systems.
The problems that we identified were:
- Signatures are a very small sample of handwriting, and therefore, cross checking signatures with a known signature sample is not very effective, as it is subjected to intra-personal variations. Especially in an examination scenario, candidates are not focused on producing a perfect signature, and therefore, the signature can be very different from the known sample.
- Existing CNN-based handwriting verification systems verify the identify of the writer using only the handwriting of a single word, or a single sentence, which might not capture the full handwriting style of the writer.
- Even though CNN-based systems are highly accurate, their decisions are not meaningfully interpretable. Even if techniques such as Grad-CAM are used to visualize the decision making process, it is not very clear how the system arrived at the decision, as handwriting quirks are very subtle.
To address these issues, we proposed a hierarchical voting system consisting of three modules for verifying exam candidates using signatures and handwriting.
- Module 1 - Signature forgery detection using vision and text embeddings using CLIP
- Module 2 - Quick handwriting verification using a vision transformer based Siamese network
- Module 3 - Personalized handwriting verification using writer-specific handcrafted features
Let's take a look at our approach, and then we will discuss each module in detail.
Our Approach
Architecture Diagram of the Overall System
The system consists of three modules, each of which are responsible for a specific task in the verification process. The voting system takes place in two stages.
First, the signature forgery detection module is used to detect if the signature is forged or not. Its main advantage is that it does not rely on a reference sample of the signature to check whether the signature is forged or not. It is simply possible to detect a forgery without the knowledge of the actual signature.
Then, the quick handwriting verification module is used to verify the candidate using minimal number of samples. Here, a novel texture-based approach is used to verify the candidate using a vision transformer based siamese network. This module can also address the intra-personal variations of the handwriting, in a special two-speed verification mode.
If both the systems agree on the identity of the candidate, we already have the majority vote, and the candidate is verified. If not, the third, personalized handwriting verification module is used. More than 67% of the time, the first two modules agree with high accuracy, based on our experiments. This first stage takes only a few seconds.
The personalized handwriting verification module is used only when the first two modules disagree, which is a rare case. This module uses a set of handcrafted features that are specific to the writer, and therefore, not only it can be used to verify the candidate but also its decisions are interpretable.
The third module is able to break the tie in the voting system, maintaining consistent accuracy in the overall system.
Demonstration
Now that we have discussed the flow, let's take a look at the system in action.
Now let's take a look at each module in detail.
Module 1 - Signature Forgery Detection Module
This module was primarily developed by our team member Sagini, inspired by the concept-based approach of the CLIP model. CLIP learns semantic relationships between images and text by projecting them into a shared embedding space. Drawing from this idea, she aimed to teach the model to semantically distinguish between genuine and forged signatures.
Preprocessing pipeline of Module 1
Initially, she attempted to replace CLIP's image embeddings with handcrafted features, but this did not yield satisfactory performance. She then used CLIP directly to extract image embeddings and crafted her own set of text prompts to describe the semantic meaning of genuine and forged signatures. Additionally, she applied graph embedding techniques to further compress and structure the extracted features.
Signature forgery detection pipeline of Module 1
The key advantage of this approach is that it does not require a reference sample of the signature to determine whether it is forged. In other words, the model can identify a forgery without having seen the actual genuine version of the signature, relying instead on general conceptual differences between forgeries and genuine signatures.
The source code for this module will be made available on GitHub soon.
Module 2 - Quick Handwriting Verification Module
This module was mainly done by our team member Rashmi, who used a vision transformer based Siamese network to verify the handwriting of the candidate. The main idea is to extract compact texture representations from handwriting samples for candidate verification. Instead of using a single word or sentence, she generates a texture from an entire paragraph, providing a richer representation of handwriting for verification.
Texture creation process of Module 2
She first experimented with CNN feature extractors such as VGG16, ResNet18, etc. However, she found that the vision transformer (ViT384) based Siamese network performed better.
Verification model architecture of Module 2
This module can operate in two modes:
- Standard Mode: Uses one sample per writer.
- Two-Speed Mode: Uses a pair of normal and fast handwriting samples from each writer to handle intra-writer variability. Distances between same-writer samples capture intra-personal variations, while distances between four different writer pairs capture inter-personal variations. A feedforward neural network uses all six distances to make the final decision.
The two-speed verification mode of Module 2
This approach is fast and accurate, though less interpretable than manual methods. The source code for this module is available on Github.
Module 3 - Personalized Handwriting Verification Module
I was responsible for this module, which delivers personalized handwriting verification using handcrafted features. The goal was to verify the handwriting while providing interpretable decisions. In a high-stakes scenario like an exam, it is crucial to understand why a decision was made, especially if it is contested. Moreover, the system breaks the tie in the voting system when the first two modules disagree.
I was able to introduce a few novelties in this module:
- A pipeline that integrates both global and local features of handwriting from line level and character level.
- An autoencoder based, writer-dependent model training pipeline that learns the handwriting style of the writer, and uses it to verify the candidate.
- Integration of SHAP-based, LLM-assisted explanations to provide feature-level interpretability.
The overall architecture of my module is shown below.
Overall architecture of the personalized handwriting verification module
When it comes to preprocessing the handwriting samples, I first cropped and resized the images to a fixed size. The most challenging part was to remove the rules from the answer scripts, as they are not part of the handwriting. I used 2 x 50 pixel-sized line-kernels to identify the lines of the image, and exploited the fact that the rules are printed lightly, and the fact that blue-ink was used for handwriting.

Rule removal process of the croppsed handwriting sample
I treated line-level features as global features, using horizontal projection profiles and a custom line-cleaning algorithm to remove overlapping contours from adjacent lines. For local features, I used the letter e, which is most frequent in English, detected with a fine-tuned YOLOv8 Small model.
I extracted the below features from the handwriting samples.
Feature extraction pipeline
After fusing the global and local features, I trained a writer-dependent autoencoder model to learn to reconstruct the same data, and the reconstruction error was used to verify the candidate. These reconstruction errors were interpreted using SHAP values to provide feature level interpretability. However, as the SHAP values themselves are not well understood by end users, I used OpenAI's o-3 reasoning model to generate explanations, to guide the user to what to look for in the handwriting.
Feature fusion, model training, and explanation pipeline
Evaluation
Data Collection
While there are well-known benchmark datasets in this field, such as CVL and IAM, we noticed several limitations.
- They are written on plain white, un-ruled paper, which doesn't reflect real-world writing conditions.
- They don't capture variations in writing speed, such as slow or fast handwriting.
- They often use different pens across samples, introducing unwanted variation.
To address these issues, we prepared a private dataset with samples from 100 writers. Each writer contributed three pairs of 50-word English paragraphs — one written at a leisurely pace and the other as fast as possible. We selected a common pen model based on a survey among participants and gave them a short warm-up period before collecting samples. The participants, aged between 16 and 30 years old, were evenly balanced in gender. Writing was done on ruled paper to better mimic real-world conditions, and the samples were scanned using a smartphone for accessibility. This was an approach inspired by previous work.
Evaluation Results
In testing the system, we worked with data from 15 writers in our private dataset. In Module 2's standard sample comparison mode, we could make 450 comparisons, while the two-speed sample comparison mode allowed for 364 comparisons — the maximum possible in each case.
For Module 3, we used a leave-one-out approach. Each time, we trained the model on the known sample, leaving out the specific sample from the test pair, and in Standard mode we left out both the fast and normal versions of the same sample. For example, if we were comparing W001_S01_F with W002_S02_N, we trained on everything except W001_S02_N, and also W001_S02_F in Standard Mode. In the two-speed mode, we followed the same method but averaged the reconstruction errors from both test samples.
For every writer, we had 10 forged and 10 genuine samples. If we compared different writers, we took the cartesian product of the 10 forged samples with the test row, ending up with 10 rows. For same-writer comparisons, we did the same with the 10 genuine samples.
By the end, the standard sample mode was tested on 4,500 comparisons, and the two-speed sample mode on 3,640 comparisons.
The results of the evaluation are shown below.
Result snapshot of the exam candidate verification system
Conclusion
We built a layered framework for verifying exam candidates from handwritten artifacts, combining three modules — signature forgery detection with vision-language embeddings, quick writer verification with automatic features, and personalized writer verification with manual features.
In Module 1, we aimed for reference-free forgery detection using the CLIP model. Early trials with manual graph embeddings showed promise on internal data but failed to generalize, so we switched to CLIP's vision encoder. This achieved 86.38% AUC / 79.33% accuracy on our private dataset and 83.91% AUC / 72.80% accuracy on CEDAR, with future work focusing on better handling of random and unskilled forgeries.
Module 2 used texture-based features in a Siamese framework, with standard and two-speed modes to address intra-writer variability. ViT384 performed best, and switching from texture-wise to sample-wise verification boosted accuracy from 80.21% to 86.89% (standard) and 89.33% to 95.33% (two-speed). Future work targets more efficient training and hybrid feature architectures.
Module 3 brought explainability through handcrafted forensic features and anomaly detection via denoising autoencoders, achieving ~90% accuracy and 98.6% AUC on the private dataset. Interpretability was enhanced using SHAP-based LLM explanations, with plans to explore concept-based methods like TCAV on CNNs.
Integrated via hierarchical voting, the system reached 91.8% accuracy in standard mode and 96.0% in two-speed mode, resolving most inter-module disagreements. Overall, it balances accuracy, robustness, and transparency, making it well-suited for high-variability, high-stakes verification such as exams.
We extend our heartfelt gratitude to our supervisors, Dr. Upeksha Ganegoda and Ms. Shalini Upeksha, for their invaluable guidance, encouragement, and support throughout this project. We are also grateful to the Faculty of Information Technology, University of Moratuwa for providing the resources and facilities that made this research possible.
This work wouldn't have been possible without the generosity of our participants, who took the time to provide their handwriting samples, or without the availability of public handwriting datasets like CEDAR, CVL, and IAM, which helped us broaden our experiments. And of course, a big thanks to the GitHub Student Developer Pack and Digital Ocean for the computing power that kept our models training late into the night.