AI Unleashes
the Power of Data

Unlock insights from your sensor data with AI, Generative AI, and Machine Learning to
get accurate, actionable responses to user prompts to make smarter decisions.

MEMS Microphone

Skin Cancer Detection

Computer Vision – Health History

Gen AI Travel

Full Stack & Grok Integration

Time Series ML & Gen AI

Cognitive & Musculoskeletal

MEMS Microphone

Gen AI Biometric Eval

Improved Accuracy Blood Pressure

Grok Generative AI Prompt / Fine Tuning

Travel Recommendations

Graph Neural Network (GNN)

Coming Soon

Transformers

Large Language Model

HuggingFace – Full Model

Note: Only the Skin Cancer AI,  Gen AI Travel, Time Series ML & Gen AI and Large Language Model are documented on the website. The remaining case studies should be up soon.

Skin Cancer Diagnosis – AI Computer Vision

Goal: Develop a rapid home test to properly classify benign vs. malignant skin lesions and convince my father to get a suspect skin lesion inspected by a Dermatologist

  1. This method utilizes machine learning to classify images (benign or malignant) and medical history to enhance classification. It is hypothesized that history may be more useful than simply optimizing the accuracy of the model and will save time in development
  2. Additional methods using skin masking (image segmentation using DeepLabV3) and an ensemble approach were utilized for improving accuracy
  3. Input Requirements: A) Method to capture a picture (i.e. smart phone) and access to the internet and B) short survey regarding history of skin lesion
  4. Output: Malignant vs. Benign

Methods:

  1. Random images from the ISIC 2020 Dataset (The International Skin Imaging Collaboration) with varied participants (sex, age, nationality – white skin tone)
  2. Training set of 457 benign and 457 malignant (total: 914) images and test set of 125 benign and 125 malignant (total: 250) images
  3. Images in JPG format were processed from various landscape pixel widths and heights to square 299 X 299-pixel RGB images with one model using 254 pixels square and another using 224 pixels square.
  4. The most relevant portion of the image was maintained during cropping
  5. Hair was not removed and in some cases scale measurement marks were left
  6. Various supervised, convolutional neural network (CNN) computer vision models were evaluated: Custom Encoder / Decoder Model, Inception V3, Xception, ResNet18, ResNet50, ResNet152, VGG16, VGG19, EfficientNet_B2, and DenseNet121
  7. Hyperparameters were tuned, data normalized, and training data was augmented (randomly horizontally flipped, partially rotated, size / contrast changed, etc.)

A short survey was given to learn the history of the skin legion (case studies)

  1. How long was the skin legion present?
    < 3 months, < 6 months, < 1 year, < 5 years, lifetime
  2. Is skin legion greater than size of a single standard No 2 pencil end eraser (6 mm)?
    Yes or no
  3. Has the skin legion increased in size over time? Yes or No
  4. How much as the skin legion increased in size?
    < Doubled in size, Doubled in size, > Doubled in size
  5. How long ago did it start growing? 3 months, 6 months, 1 year, 5 years, slowly over lifetime
  6. Does it bleed on it’s own (without picking, squeezing)? Yes or No
  7. Has it significantly changed in appearance? Yes or No
  8. Have you had a previous malignant skin lesion(s) (skin cancer) removed? Yes or no
  9. Does lesion itch, burn or cause pain? Yes or No

This information is not available from ISIC database and is only available for case studies.

First Step in AI is Understanding the Data

Morphological Characteristics of a Malignant Skin Lesion

Asymmetry

Border Irregularity

Color / Raised

> 6 mm Diameter

Ring or Halo of Redness

Scales

Ulcerations / Crust

Evolution Over Time

Examples of Malignant Skin Lesions

Source: National Cancer Institute

Melanoma

Melanoma Skin Cancer

Melanoma

Asymmetrical Melanoma

Asymmetrical Melanoma

Melanoma

Melanoma Advanced Stage

Melanoma Skin Cancer

Melanoma

Asymmetrical Melanoma

Basal Cell Carcinoma Ulcerated

Skin Cancer, Basal Cell Carcinoma, Superficial

Basal Cell Carcinoma Superficial

Basal Cell Carcinoma

Melanoma Skin Cancer

Basal Cell Carcinoma

Asymmetrical Melanoma

Basal Cell Carcinoma

Skin Cancer, Basal Cell Carcinoma, Superficial

Kaposi’s Sarcoma

Kaposi’s Sarcoma

Melanoma Skin Cancer

Squamous Cell Carcinoma

Asymmetrical Melanoma

Squamous Cell Carcinoma

Skin Cancer, Basal Cell Carcinoma, Superficial

Squamous Cell Carcinoma

Examples of Masked Images (Image Segmentation with DeepLabV3)

Using binary segmentation image processing, the image was divided into two distinct regions: 1) the skin lesion to be classified as maglinant or benign and 2) the surrounding healthy skin. The surrouding healthy skin was then masked so only the skin lesion was visible. There are a number of methods to accomplish this including Meta’s Segment Anything Model (DICE of 0.93), nnU-Net (DICE of 0.96), and TransUNet (DICE of 0.95) but each of these would have taken many hours to fine tune and required significant GPU resources. BASNet or Boundary-Aware Segmentation Network was also considered (2 – 4 hours of training, DICE of 0.98) but I preferred DeepLabV3 which was readily available in PyTorch and could be trained in minutes without another software download. DeepLabV3 is an atrous spatial pyramid pooling (ASPP) model, designed for semantic segmentation and is widely used in medical imaging. ASPP applies several atrous convolutions in parallel, each with a different dilation rate (e.g., 6, 12, 18), to capture features at multiple scales: 1) small features with low dilation rates (e.g., 6) focus on fine details, like lesion edges or small lesions and 2) large features with high dilation rates (e.g., 18) to capture broader context, like the overall shape or surrounding skin texture. The DeepLabV3 model used ground truth masks (DICE 1.0) from the ISIC 2018 dataset for training. The Dice coefficient is a statistical metric used to quantify the accuracy of image segmentation. 

Benign

Melanoma Skin Cancer

Benign

Asymmetrical Melanoma

Malignant

Skin Cancer, Basal Cell Carcinoma, Superficial

Malignant

Benign Masked

Melanoma Skin Cancer

Benign Masked

Asymmetrical Melanoma

Malignant Masked

Skin Cancer, Basal Cell Carcinoma, Superficial

Malignant Masked

Example Python Code for EfficientNet_B2 in Pytorch

This is an example of code to train a convolutional neural network (EfficientNet_B2) model. Several other models were considered including ensembles. This provides the structure for the code without being verbose with all the code that was experimented with or used. The choice of models was limited to lower complexity models such as EfficientNet_B2 (9.1 million parameters, image size 260 x 260 pixels) vs EfficientNet_B6 (43 million parameters, image sizes up to 456 x 456 pixels) for run time efficiency.

import os
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import torchvision.transforms as transforms
from torchvision.models import efficientnet_b2
from torchvision.models import densenet121
import numpy as np
import random

random.seed(42) 
np.random.seed(42)
torch.manual_seed(42) 
torch.cuda.manual_seed_all(42)

# Focal Loss
class FocalLoss(nn.Module):
    def __init__(self, gamma=2.0, weight=None):
        super().__init__()
        self.gamma = gamma
        self.weight = weight
    
    def forward(self, inputs, targets):
        ce_loss = nn.functional.cross_entropy(inputs, targets, weight=self.weight, reduction='none')
        pt = torch.exp(-ce_loss)
        focal_loss = ((1 - pt) ** self.gamma * ce_loss).mean()
        return focal_loss

# Custom Dataset
class SkinLesionDataset(Dataset):
    def __init__(self, benign_dir, malignant_dir, transform=None):
        self.benign_dir = benign_dir
        self.malignant_dir = malignant_dir
        self.transform = transform
        self.images = []
        self.labels = []
        
        # Load benign images (label 0)
        for img_name in os.listdir(benign_dir):
            if img_name.endswith(".jpg"):
                self.images.append(os.path.join(benign_dir, img_name))
                self.labels.append(0)
        
        # Load malignant images (label 1)
        for img_name in os.listdir(malignant_dir):
            if img_name.endswith(".jpg"):
                self.images.append(os.path.join(malignant_dir, img_name))
                self.labels.append(1)
    
    def __len__(self):
        return len(self.images)
    
    def __getitem__(self, idx):
        img_path = self.images[idx]
        label = self.labels[idx]
        image = Image.open(img_path).convert("RGB")
        if self.transform:
            image = self.transform(image)
        return image, label

# Transforms for 299x299 images
train_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.RandomResizedCrop(224, scale=(0.9, 1.0)),
    transforms.RandomRotation(15),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomVerticalFlip(p=0.5),
    transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.05),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

test_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Datasets

train_dataset = SkinLesionDataset(
    benign_dir="E:/skin/Train_With_Mask/benign",
    malignant_dir="E:/skin/Train_With_Mask/malignant",
    transform=train_transform
)
test_dataset = SkinLesionDataset(
    benign_dir="E:/skin/Test_With_Mask/benign",
    malignant_dir="E:/skin/Test_With_Mask/malignant",
    transform=test_transform
)

# DataLoaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Model: EfficientNet-B2
model = efficientnet_b2(weights='IMAGENET1K_V1') 
model.classifier = nn.Sequential(
    nn.Dropout(p=0.5, inplace=True), #0.3 - (0.5 85.6%)
    nn.Linear(model.classifier[1].in_features, 2)  # Binary classification (benign/malignant)
).cuda()

# Unfreeze the last two MBConv blocks (stages 6 and 7)
for param in model.features[6:].parameters():
    param.requires_grad = True
model = model.cuda()

# Optimizer: AdamW
optimizer = optim.AdamW(
    model.parameters(),
    lr=2e-4, 
    weight_decay=1e-3, 
    betas=(0.9, 0.999) 
)

scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.7, patience=3, verbose=True) #"min"

criterion = nn.CrossEntropyLoss(label_smoothing=0.1) 

# Training loop
num_epochs = 6
for epoch in range(num_epochs):
    model.train()
    train_loss = 0.0
    train_correct = 0
    train_total = 0
    for images, labels in train_loader:
        images, labels = images.cuda(), labels.cuda()
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        optimizer.step()
        train_loss += loss.item()
        _, predicted = torch.max(outputs, 1)
        train_total += labels.size(0)
        train_correct += (predicted == labels).sum().item()
    
    train_loss /= len(train_loader)
    train_acc = train_correct / train_total
    
    # Validation
    model.eval()
    my_list = []
    val_loss = 0.0
    val_correct = 0
    val_total = 0
    with torch.no_grad():
        #for images, labels in test_loader:
        for batch_idx, (images, labels) in enumerate(test_loader):
            images, labels = images.cuda(), labels.cuda()
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs, 1)
            val_total += labels.size(0)
            val_correct += (predicted == labels).sum().item()
            # Identify misclassified images
            mask = predicted != labels  # Creates a boolean tensor where True indicates a misclassification
            for idx in range(len(mask)):
                if mask[idx]:
                    # Calculate the global index in the dataset
                    global_idx = batch_idx * test_loader.batch_size + idx
                    # Get the filename from the dataset's images attribute
                    filename = test_loader.dataset.images[global_idx]  # Use images attribute
                    my_list.append(filename)
    
    val_loss /= len(test_loader)
    val_acc = val_correct / val_total
    
    print(f"Epoch {epoch+1}, Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}, "
          f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")
    
    scheduler.step(val_loss)
    
# Print the filenames of misclassified images
print("Misclassified image filenames:")
for filename in my_list:
    print(os.path.basename(filename))
print(f"Number of misclassified images: {len(my_list)}")

# Save model
torch.save(model.state_dict(), "E:/skin/models/efficientnet_b2_finetuned.pth")
print("Model saved to E:/skin/models/efficientnet_b2_finetuned.pth")

EfficientNet_B2:  Loss – Accuracy Graphs

Computer Vision Results

  • It is noted that the vast majority of malignant skin lesion images were melanoma.
  • Three individual models were chosen for their training and evaluation speed and accuracy: 1) ResNet18 (86.8% accuracy), 2) EfficientNet_B2 (88.0% accuracy) and 3) DenseNet121 (87.6% accuracy). EfficientNet_B2 and DenseNet121 models used masked images and ResNet18 used the full image with no masking.
  • An ensemble of DenseNet121, ResNet18, and EfficientNet_B2 (90.0% accuracy) was also evaluated.  The ensemble made predictions with the highest accuracy EfficientNet_B2 model and only changed it’s result if ResNet18 and DenseNet121 agreed on an alternative prediction.
  • Sample size was limited due to computing resources and hence accuracy was not optimized beyond this point. Various other ensemble approaches (softmax averaging, majority voting, weighted voting, etc.) were considered but the one highlighted above was the most accurate.
  • The malignant accuracy was better than the benign accuracy for each of the models. For example, with EfficientNet_B2 benign accuracy was 83.2%
    and malignant accuracy was 92.8%.
  • False positives from analysis of images alone: 7.2% (cautious approach, high likelihood of biopsy). See sampling (not all) below.
  • False negatives from analysis of images alone: 3.2% (risk of not being treated resulting in high potential for adverse outcome). See sampling (not all) below.
  • Most false positives very difficult to determine from image alone hence examining and including health history has merit.
  • 87.5% of false negatives I would have gotten biopsied from image alone hence accuracy improvement in model is warranted with more training images.

False Positive

Benign

False Positive

Benign

False Positive

Benign

False Positive

Benign

False Positive

Benign

False Positive

Benign

False Positive

Benign

False Positive

Benign

False Positive

Benign

False Positive

Benign

False Positive

Benign

False Positive

Benign

False Negative

Malignant

False Negative

Malignant

False Negative

Malignant

False Negative

Malignant

False Negative

Malignant

False Negative

Malignant

Case Studies

Mole from Birth

Model: Benign

Health History: Benign

Birthmark from Birth

Model: Malignant

Health History: Benign

New Growth Face

Model: Malignant

H. History: Malignant

Large Mole Many Years

Model: Benign

Health History: Benign

New Large Lesion

Model: Malignant

H. History: Malignant

New Large Lesion

Model: Malignant

H. History: Malignant

The AI model was correct for 3 / 6 case studies and possibly correct but unconfirmed for 1/6. It made two mistakes on an extremely challenging birthmark and a new growth on my face. The new growth on my face was Sebaceous hyperplasia or an oil-producing sebaceous gland that became enlarged / clogged (layman’s terms) and resembled Basal Cell Carcinoma which I had previously at a younger age. The models had a higher tendency for false positives than false negatives which was good. EfficientNet_B2 accurately diagnosed this lesion as benign but since DenseNet121 and ResNet18 both diagnosed this as malignant, the EfficientNet results was overturned. Hence, it was prudent that I got it checked. The birth mark resembled melanoma but health history proved to be very important in this case. Given the subjects age, no history of skin cancer, and being present since birth, this legion was not suspect and this was confirmed by a dermatologist. The first new large lesion was a confirmed (biopsy) cancerous lesion on the arm. The last case study is a new enlarged lesion on my dad’s chest. Despite my efforts with this analysis, the malignant diagnosis of the model, and his history of skin cancer, he will not get it checked. He is at end of life and doesn’t have an interest so the case is not confirmed by biopsy.

Overall, health history definitely helped in diagnosis but also contributed to a false positive. The case studies also confirmed what was already known in that a much larger sample size is needed to improve the models from 90 to 95-96% accuracy (best in the literature). It would be very interesting to test health history in conjunction with model performance on 5000 samples. I would also like to test more complex diagnosis and masking models with more computing resources.

Generative AI Travel

This generative AI case study integrated Grok xAI into an AWS environment and was deployed with an Amplify mobile app with ReactJS so it works on any platform. The API for Grok integration was built in the backend with Lambda, API Gateway, and Identity and Access Management. The data was stored in a S3 data lake (images) and a Dynamodb data base. The prototype was built for demonstration to Spirit Airlines prior to their bankruptcy. Their intention was to revolutionize their frequent flier program and AI was piece they were exploring.

Grok integration provided access to a capable large language model for user prompt engagement and provided enrichment with images and travel features to enable quick decision making.

The video shows a brief demonstration of user login, the mobile app (first showing some of the health app I created), and how the travel planning process could be enhanced with generative AI. Ultimately this can be improved by aiding the user in developing a travel itinerary and expanding into many elements of the travel planning process. Spirit specifically requested a demonstration for hotel planning.

Current travel sites leave so much to be desired with planning taking hours / days and this process can be significantly enhanced with AI as a helper in the background without over dominating the experience. Too many times AI completely takes over leaving the user with a frustrated experience. Allowing the user to maintain control is extremely important. 

Time Series Machine Learning and Generative AI

Problem Statement: In medicine, there is no clinical means to accurately diagnose musculoskeletal conditions such as a rotator cuff tear without advanced radiology even by a highly trained orthopedic surgeon. Furthermore, there is no uniform testing program that exists for traumatic brain injury (TBI) / cognitive conditions that includes both a means of immediate assessment on the field and comprehensive clinical management using a validated instrument.

Solution: To address this, machine learning was applied to motion signatures for the purpose of medical triage, diagnosis, and monitoring to facilitate expedient, cost efficient, and effective patient care and health management for musculoskeletal and cognitive conditions. The motion signatures are generated from data collected from inertial sensors monitoring dynamic motion of the human body. Furthermore, generative AI was introduced in combination with machine learning classification and regression of motion signatures for treatment planning and injury prediction. The generative AI is a fine-tuned LLM, pretrained in a broad corpus of medical knowledge and further refined using domain-specific datasets including AI-driven motion signature profiling, biometric profiles, and health records. This multimodal approach provides a novel data-driven framework for comprehensive patient assessment and treatment planning using diverse inputs through its generative architecture to produce actionable outputs. For example during triage, the LLM evaluates the combined data with a high anomaly score, elevated pain, and a history of prior injury to assign a priority level (e.g., urgent, routine), guiding clinical resource allocation.

To validate this AI-based system, a 35-person clinical study was conducted for rotator cuff injury with 96% accuracy in classification of rotator
cuff tear vs. inflammation without advanced radiology although MRIs were used to determine proper labeling of the training and test data. In addition, another groundbreaking study for diagnosing TBI and degenerative cognitive conditions was conducted leveraging cognitive impairment data (sober vs intoxicated), with an initial model accuracy of 80% gated by limited sample size and an interesting fact. Careful review of the data showed men lost more motor control than women with intoxication. Hence the machine learning properly classified all sober subjects and men intoxicated, but not women intoxicated. The specifics of the algorithms for this patent pending AI invention, owned by David, cannot be discussed in detail but the data and general approach will be shared as presented at national conferences. 

Test Methods Musculoskeletal Injury

  1. Random subjects (men and women) with varied ages participated. This resulted in a sample size of 35 divided between train and test groups for the machine learning algorithm. There were 18 normal (inflammation only) and 17 abnormal conditions (rotator cuff tear).
  2. Two tests were conducted: Forward extension and external rotation.
  3. Forward extension: Arm at side and elevated in the Scapular plane (45 degrees between front and side position, fully extended) 180 degrees overhead and back down at side.
  4. External rotation: Arm positioned in an “L” orientation with the elbow touching the hip and then rotated front to back and back to front.
  5. The tests were conducted with no weight and weight in the hand.
  6. Sensor was located in the sulcus of the inner biceps and secured with a strap. 

Test Methods Cognitive Impairment

  1. Random subjects (60% men; 40% women) with varied ages participated. This resulted in a sample size of 20 divided between train and test groups for the machine learning algorithm.
  2. Participants were asked to perform two tactile edge orientation processing movements, sober and under the influence of alcohol. This included a lower extremity leg movement (LELM) test of the heel sliding along the tibia of the opposite leg from bottom of shin to knee and back (in seated position) and the “what time is it?” test with movement of arm from side, reading time of wristwatch, and returning to the side.
  3. Cognitive conditions were simulated with BAC (blood alcohol content) levels of 0.00 (sober), 0.02, 0.04, 0.06, 0.08, and 0.09%. The BAC was measured using a breathalyzer. Each motion test was repeated three times.
  4. Sensors were located either on the wrist (next to watch) for the “what time is it?” test or on the inside ankle for the LELM test (strap made transparent so you can see sensor). 

Sensor Locations

External Rotation / Forward Extension

What Time is It?

Lower Extremity Leg Movement

Data from Musculoskeletal Evaluations

Data was collected in the x, y, and z axes for acceleration and rotation of the arm in forward extension. The data below shows 18 normals tightly banded around the average and +/- 3 sigma limits (dark points). Only the acceleration data in the x and y axes is shown for brevity.

This is an example of abnormal acceleration data for the x and y axes. Clearly the data falls outside of the +/- 3 sigma limits.

Data was collected in the x, y, and z axes for acceleration and rotation of the arm in external rotation. The data below shows 18 normals tightly banded within the dark lines (not +/- 3 sigma limits but critical thresholds). Only the acceleration data in the x and y axes is shown for brevity.

This is an example of abnormal acceleration data for the x and y axes. Clearly the data falls outside of the threshold limits.

Data from Cognitive Evaluations

Data was collected in the x, y, and z axes for acceleration and rotation of the arm during the “What time is it?” test. The data below shows a clear difference between sober and intoxicated. Only a portion of the acceleration data shown.

Sober BAC 0.00%

Intoxicated BAC 0.09%

Data was collected in the x, y, and z axes for acceleration and rotation of the leg during the lower extremity leg movement test. The data below shows a clear difference between sober and intoxicated. Only a portion of the acceleration data shown.

Sober BAC 0.00%

Intoxicated BAC 0.09%

Further Exploration of the Data

It may appear that normal (inflammation only) and abnormal (tears) rotator cuffs and sober and intoxicated are easy to distinguish with even basic statistics. However, the following data will show that a robust solution is far from trivial. The ultimate goal is to distinguish between minor partial tears (25% or less) that may not need surgery and partial tears > 50% or full tears that may need surgery. The first two graphs show clear delineation between a partial and full tear as expected but in the third graph, the full tear appears normal. However, in external rotation there is a distinguishable difference or feature between partial and full tears that traditional statistics cannot properly classify where AI excels with methods such as random shapelet transform with random forest classifiers and convolutional neural networks. If the pending patent is accepted, this subtle difference can be shared. 

Partial Tear

Full Tear

Full Tear Indistinguishable from Normal

Cognitive data of sober and intoxication showed a similar trend. In some cases, sober versus intoxicated was extremely obvious as shown above. In other cases, only subtle features were distinguishable via time series classification using random shaplet transform with random forest.

Sober BAC 0.00%

Sober BAC 0.00%

Sober BAC 0.08%

Sober BAC 0.00%

Intoxicated BAC 0.00%

Sober BAC 0.08%

Random Shapelet Transform (RST) is a technique used in time series classification that leverages subsequences called shapelets to extract meaningful, discriminative features from time series data. Shapelets are a short subsequence (a contiguous segment) of a time series data that represents a distinctive pattern or feature relevant to the classification task. It’s particularly effective for identifying local patterns or shapes within time series that are discriminative for classification tasks, such as distinguishing between normal and abnormal motion signatures. The motion signature is randomly sampled to make it faster and more scalable instead of exhaustively searching all possible subsequences (which is computationally expensive). RST then picks the best shapelets based on their ability to separate the classes (normal vs. abnormal).

For example with forward extension for rotator cuff evaluation in the y-axis, a shapelet capturing a cosine-like dip may be prevalent in normal signatures, while a shapelet capturing a central leveling may be indicative of abnormal signatures. These selected shapelets are not averages of the respective classes but are individual subsequences chosen for their discriminative power.

For each selected shapelet, the method computes a distance metric, such as the Euclidean distance, between the shapelet and every subsequence of the same length within Tᵢ (representation of time series). This is performed using a sliding window approach, where the shapelet is compared to all possible contiguous segments of Tᵢ, regardless of position. The minimum distance across all subsequences is retained for each shapelet, representing the closest match between the shapelet’s pattern and any portion of Tᵢ. This process generates a feature vector (transform) for Tᵢ, where each element corresponds to the minimum distance to one of the selected shapelets.

The feature vectors from the RST are classified using a random forest, an ensemble of decision trees predicting whether Tᵢ is normal or abnormal. The forest aggregates these predictions across all trees via majority voting. This ensemble approach leverages the collective insight of diverse trees built around adapting decision boundaries during training, offering a novel, data-driven alternative for detecting physiological deviations in motion signatures. Normal and abnormal classification are used for simplicity but abnormal may span multiple classes representing different levels of injury or disease progression. Other approaches such as convolutional neural networks (CNN) can also be used. 

Conclusions

These results are promising and warrant larger studies to confirm and refine the models and expand their scope. For example, this method could be applied to concussions for triage, diagnosis, monitoring, treatment planning, and injury prediction. However, there may be differences in motor control and resulting movements for women and men as seen in the intoxication study that may require different test methods. Furthermore, learned tolerances occurring when impaired need to be accounted for in the models. 

Let’s examine another example to pull this all together. Consider a young female athlete that wants to play competitive soccer with ACL tears as a common risk for this athlete population. Before the season, she is evaluated for injury risk using a repeated jump test with sensors placed on the lower, inner thigh of each leg. The sensors capture the motion signature data during the prescribed repetitive jump test. The AI-driven motion signature processing trained on normal and abnormal data initially classifies the injury risk using one or a combination of classification, regression, and unsupervised learning. This injury risk is then refined using the fine-tuned AI LLM considering the AI-driven motion signature processing results, a broad range of health knowledge, health history, and biometrics. It then provides a comprehensive and quantifiable evaluation of the injury risk and a treatment plan to minimize this risk. The treatment plan’s effectiveness can then be monitored and adjusted as needed based on a feedback loop of follow up evaluations. The fine-tined AI LLM can also predict performance of the athlete in related jumping or sprinting activities if treatment protocol is followed or not. This has the potential to revolutionize youth sports as we know it as the rate of injury could drop significantly while simultaneously improving performance and enjoyment.

HuggingFace LLM – Text Classification

This text-classification solution is relatively straight forward, but concisely illustrates the process of solving a large language model problem. The pretrained full model is finetuned by adjusting hyperparameters, elements of transformers are used such as tokenization, and the model is evaluated on a separate group of test data after training and evaluation. Evaluation of the model concludes with real examples created by Grok and sentence pairs the model got correct and incorrect from the test data (three each).

Problem Statement: Do two sentences share the same meaning (is one a paraphase or not). This is a classical problem that demonstrates how text can be classified and compared. 

For example: Do these two sentences share the same meaning or paraphased?

Sentence 1: Amrozi accused his brother , whom he called ” the witness “, of deliberately distorting his evidence.

Sentence 2: Referring to him as only ” the witness “, Amrozi accused his brother of deliberately distorting his evidence.

Solution: This is a fine-tuned version of the google – bert-base-uncased optimized for text classification (do two sentences share the same meaning) using the GLUE MRPC data set. The model was saved after epoch 3 of 4 to capture peak performance to balance efficiency and accuracy for real-world text analysis. See full code on HuggingFace.

Key Features:

  1. Efficient Fine-Tuning: Trained with low-resource setup (single GPU – NVIDIA GeForce RTX 3070 (8 GB Ram), 4 epochs).
  2. Strong Performance: Achieves 88.0% accuracy and F1 of 91.7% on validation, rivaling the best models for the task using this small data set.

Training Details:

Even through the bert model already incorporates dropout, adding additional measures improved performance. In addition, adding a 10% warm up to the learning rate schedule also helped. I found that having 4 epochs but using weights from 3rd improved performance.

Hyperparameters:

  1. Batch Size: 8
  2. Learning Rate: 3e-5
  3. lr_scheduler = get_scheduler(“linear”, optimizer=optimizer, num_warmup_steps=int(0.1 * num_training_steps), num_training_steps=num_training_steps)
  4. Weight Decay: 0.03
  5. Dropout:hidden_dropout_prob=0.3, attention_probs_dropout_prob=0.2, classifier_dropout=0.2)
  6. Epochs: 4 (saved after 3)
  7. Optimizer: AdamW

Results Illustrated

Test Metrics Summary:

Test Accuracy Test F1 Score
0.8296 0.8793

Results Highlighted Through Examples

Correct and Incorrect Sentence Pairs from the Test Data:

Correctly Classified Test Sentence Pairs:

Example 1:

    • Sentence 1: PCCW ‘s chief operating officer , Mike Butcher , and Alex Arena , the chief financial officer , will report directly to Mr So .
    • Sentence 2: Current Chief Operating Officer Mike Butcher and Group Chief Financial Officer Alex Arena will report to So .
    • Prediction: Paraphrase
    • True Label: Paraphrase
    • Logits: [-2.2242250442504883, 2.586000680923462]
    • Idx: 1

Example 2:

    • Sentence 1: The world ‘s two largest automakers said their U.S. sales declined more than predicted last month as a late summer sales frenzy caused more of an industry backlash than expected .
    • Sentence 2: Domestic sales at both GM and No. 2 Ford Motor Co. declined more than predicted as a late summer sales frenzy prompted a larger-than-expected industry backlash .
    • Prediction: Paraphrase
    • True Label: Paraphrase
    • Logits: [-0.5495507717132568, 1.2153574228286743]
    • Idx: 2

Example 3:

    • Sentence 1: According to the federal Centers for Disease Control and Prevention ( news – web sites ) , there were 19 reported cases of measles in the United States in 2002 .
    • Sentence 2: The Centers for Disease Control and Prevention said there were 19 reported cases of measles in the United States in 2002 .
    • Prediction: Paraphrase
    • True Label: Paraphrase
    • Logits: [-2.498159170150757, 2.9100100994110107]

Incorrectly Classified Test Sentence Pairs:

Example 1:

    • Sentence 1: A tropical storm rapidly developed in the Gulf of Mexico Sunday and was expected to hit somewhere along the Texas or Louisiana coasts by Monday night .
    • Sentence 2: A tropical storm rapidly developed in the Gulf of Mexico on Sunday and could have hurricane-force winds when it hits land somewhere along the Louisiana coast Monday night .
    • Prediction: Paraphrase
    • True Label: Not a paraphrase
    • Logits: [-1.446120023727417, 2.0181071758270264]
    • Idx: 13

Example 2:

    • Sentence 1: Hong Kong was flat , Australia , Singapore and South Korea lost 0.2-0.4 percent .
    • Sentence 2: Australia was flat , Singapore was down 0.3 percent by midday and South Korea added 0.2 percent .
    • Prediction: Paraphrase
    • True Label: Not a paraphrase
    • Logits: [-0.11220673471689224, 0.7146934866905212]
    • Idx: 15

Example 3:

    • Sentence 1: Ballmer has been vocal in the past warning that Linux is a threat to Microsoft .
    • Sentence 2: In the memo , Ballmer reiterated the open-source threat to Microsoft .
    • Prediction: Paraphrase
    • True Label: Not a paraphrase
    • Correct: False
    • Logits: [-1.2966821193695068, 1.964686393737793]

This highlights some limitations:

    • IDX 15) Been vocal could be in memo form but it also could have been just talked about in meetings and not in written form. Likewise, Linux is open-source but open-source may not be Linux.
    • IDX 13) Singapore was down 0.3 percent which is in the range of 0.2-0.4 but Singapore could have also flucuated between 0.2-0.4 and not been stagnant. Hence the second sentence does not explain 0.4 percent.
    • IDX 3) The second sentence fails to highlight Texas and hence could not be considered a paraphase.
      Four Sentences Created by Grok to Test the model:

Sentence Pairs Generated by Grok:

Example 1:

    • {‘sentence1’: ‘The company announced a major merger with its biggest competitor last week.’,
    • ‘sentence2’: ‘Last week, the firm revealed plans to merge with its primary rival.’,
    • ‘label’: 1,
    • ‘idx’: 102500}

Example 2:

    • {‘sentence1’: ‘The new policy aims to reduce emissions by 20% over the next five years through incentives for green technology.’,
    • ‘sentence2’: ‘Over the coming five years, this initiative seeks to cut pollution levels by a fifth by promoting eco-friendly innovations’,
    • ‘label’: 1,
    • ‘idx’: 102501}

Example 3:

    • {‘sentence1’: ‘The team won the championship after a dramatic comeback in the final quarter.’,
    • ‘sentence2’: ‘The players celebrated their victory following an intense practice session before the big game.’,
    • ‘label’: 0,
    • ‘idx’: 102502}

Example 4:

    • {‘sentence1’: ‘The legislation seeks to curb carbon emissions by 25% within a decade through tax incentives for renewable energy.’,
    • ‘sentence2’: ‘This bill aims to reduce CO2 output by a quarter over ten years by offering tax breaks for green energy solutions.’,
    • ‘label’: 1,
    • ‘idx’: 102503}]

Here are the results (All Correct):

    • Predicted class for pair 1: 1, true value is 1, and idx is 102500
    • Predicted class for pair 2: 1, true value is 1, and idx is 102501
    • Predicted class for pair 3: 0, true value is 0, and idx is 102502
    • Predicted class for pair 4: 1, true value is 1, and idx is 102503

Gen AI Biometric Eval – Improved Accuracy Blood Pressure

The capability of LLM’s, as of October 2025, has increased dramatically over the last 18 months. I wanted to really challenge this capability with the analysis of multiple graphs to find the Systolic and Diastolic blood pressures. This was an extremely challenging task because the only fine tuning was a PDF provided to the LLM model and, by it’s own admission, the model has not encountered these types of graphs in medical analysis. The model also confessed it had difficulty following axes and grid lines with precision. I choose Grok expert mode because Google did not have the capability in its free tier and Grok failed at attempts in fast mode. With prompt iteration, Grok got extremely close to the actual values.

What is the basis for this analysis? Automated blood pressure machines are grossly inaccurate. Often readings are +/- 15 mmHg. This is improved by taking 3 readings and averaging the group for Systolic and Diastolic and using higher accuracy arm cuff pressure devices. Auscultatory blood pressure measurement with a stethoscope and manual arm blood pressure cuff are the closest in accuracy to invasive intra-arterial blood pressure measurement.

Automated blood pressure machines have gone through little improvement over the last 15 years. Today, a miniature stethoscope can be incorporated into the device to provide additional information. However, stray sounds can often create challenges for using this as the only data source. Mutiple data sources can bridge the gap to improve accuracy of this proposed device. Furthermore, AI has not been incorporated into this analysis. With the cost of edge computing becoming afforable, this is now a possibility yet we are still likely 1 – 2 years away.

Here are the graphs I asked Grok to interpret:

  1. Chart 1: Auscultatory blood pressure measurement – KorotKoff Sounds (Korotkoff sound amplitude versus time (seconds))
  2. Chart 2: Cuff Pressure (mmHg) Vs. Time (seconds) – With Oscillatory Peaks
  3. Chart 3: Oscillatory Wave Envelope (OWE) – Lower and Upper Envelope (Pressure (mmHg) versus Time (seconds))
  4. Chart 4: Derivative of Oscillatory Wave Envelope with Respect to Time Vs Time
  5. Chart 5: Cuff Pressure Vs. Time – Without Oscillatory Peaks
    (Pressure (mmHg) versus Time (seconds))
  6. Chart 6: Derivative of Oscillatory Wave Envelope with Respect to Cuff Pressure Vs Cuff Pressure

The pdf guide used to pretrain the model is provided here. I encourage you to review this guide as it provides the methodology for how the graphs are interpreted. 

Graphs of the Data

This data was collected with a MEMS microphone attached to a stethoscope and a MEMS pressure sensor attached to a manual arm blood pressure cuff. This home made prototype can be refined. 

Notice the number of stray sounds making the actual Korotkoff sounds more challenging to decifer. This is why correlation with charts 2 and 3 is so important. The model could be fine tuned by calculating the pulse spacing from chart 3 and feeding that to the model. However, this was not done. I wanted to see if the model would do this on it’s own. 

Prompt and Grok’s Response

My analysis (human brain) of the graphs resulted in a Diastolic blood pressure of 73 mmHg and Systolic blood pressure of 130 mmHg. Grok came extremely close. Imagine if we provided 5000 data points of fine tuning and optimized the process to collect the data instead of a home made prototype.

Note: Grok was more accurate when I called out its previous mistakes in fast mode. It worked harder to find the best solution.

The wording of the fine tuning document was an interative process. I had to learn how Grok interpreted what was written and modify the words used putting importance of specific criteria. I also had to stress slope change definition to Grok in the prompt. It took slope change as first going negative and then positive (like an inflection point). With more refinement over days, I believe even this can be improved.

Ready to discuss your AI project?

All photos and videos except the following are copyrighted and owned by David DiPaola:

NYC Photo by Sai De Silva on Unsplash.