Over the course of Project Converse, we needed powerful and project-specific tools to solve our problems. Some of those we were able to find, like Pupil Labs eye-tracking glasses or ELAN, while others we had to invent. Here is a partial list of tools that we employed. Access to the tools is not provided at this moment, but we are working to provide appropriate links as soon as possible. Please contact Jeff Higginbotham at cdsjeff@buffalo.edu for additional information.
Transcription and Analysis Tools
Data Capture and Session Tools
Tasks and Training
Prometheus and Grafana: System Monitoring
As DEAN became more distributed across OS-DPI, ASR, the LLM, speech synthesis, and server-side routing, we added Prometheus and Grafana to monitor system behavior during live testing. Prometheus collected metrics at the system and service level: API status, Raspberry Pi connection state, server activity, request timing, response-generation latency, connection health, and processing behavior across sessions. Grafana rendered these metrics in a dashboard we monitored in real time during testing.
This infrastructure was necessary because most of DEAN’s early failures did not involve a single component breaking outright. Instead, they arose from small delays, brief interruptions, and misattributed errors distributed across components. Prometheus captured these signals continuously. Grafana let us identify where delays accumulated, whether the server and web interface were communicating correctly, whether the Raspberry Pi remained connected, and whether the system was stable enough for participant testing. The result was a shift from post-session guesswork to real-time diagnosis, which became especially important as DEAN transitioned from a Raspberry Pi-based prototype to a web-based system accessed by participants on their own devices.
Test Interfaces
We developed a variety of test interfaces to evaluate how different LLMs perform in controlled AAC-style communication tasks. These interfaces let us test whether a model maintains a target persona, follows prompt constraints, and generates responses that fit specific conversational action sequences. The goal was to move beyond general impressions of response quality and create a more structured way to compare models. Through these interfaces, we tested how models handled persona consistency, response length, tone, JSON formatting, second-pair-part generation, and action-sequence selection across different prompt designs.
We also added memory and knowledge-base testing so we could evaluate whether the LLM was using stored information correctly instead of giving generic responses. This helped us check whether the model retrieved relevant details, stayed consistent with the knowledge base, and adapted its responses when the available memory changed. Overall, these interfaces gave us a practical evaluation environment for comparing LLM persona behavior, promptability, action-sequence performance, and grounded use of memory or KB content.
ELAN
ELAN (EUDICO Linguistic Annotator) is a professional video-analysis and annotation platform widely used in linguistics, communication, and interaction research. Throughout Project Converse, ELAN served as our primary tool for synchronizing video, transcript, and coding data, allowing us to analyze AAC-mediated interactions at a fine-grained temporal level. It was used to transcribe more than 150 videos, examine composition delays, repair sequences, engagement practices, turn transitions, and other interactional phenomena that informed the design of MOVE, ENGAGE, ENCHRONY, and DEAN. We have also developed import programming to integrate the time-coded device selections logged through OSDPI into the ELAN program.
Gravar
Gravar, Portuguese for “record,” is an open-source browser-based video analysis and transcription environment developed by Jeff Higginbotham to support detailed study of video-recorded interaction with a focus on fine-grained micro- and conversation analyses. Unlike general-purpose video players or transcription tools, Gravar integrates synchronized video playback, transcript editing, time-stamped coding, comments, waveform support, and import/export functions, including ELAN, TSV, and XLS, within a single workspace.
Its design emphasizes close analysis of interactional timing. Users mark start and end points for transcript lines, play transcript-linked video segments, compare or synchronize two video sources, create frame captures, and export frame clips and video clips. Gravar also supports reusable code and symbol sets, video and transcript pop-out viewing, project-based saving, and structured exports for analysis and reporting. These features make it useful for examining the fine-grained coordination of talk, action, device use, and timing in video data.
Snippet Maker
We developed a locally hosted, browser-based video processing application to support research workflows requiring de-identified video clips. The tool uses structured JSON clip definitions and source video files to batch-export precisely trimmed video snippets while preserving participant confidentiality. It includes automated face detection and masking, manual region-of-interest censoring, adjustable blur and pixelation options, eye-strip masking, and shape annotations such as rectangles, circles, arrows, and lines.
The application also supports visual enhancement and presentation features, including grayscale conversion, brightness and contrast adjustment, slow-motion processing, title cards, fade transitions, pencil-sketch rendering, and full-frame pixelation with exclusion regions. A live preview system allows researchers to inspect masking, annotations, and visual effects before export. Because the application runs entirely on the user’s local machine with no cloud dependencies, it supports secure video de-identification while reducing the time required to prepare research clips for analysis, presentation, and dissemination.
WhisperX
Through WhisperX and our transcription software, including ELAN and Gravar, the idea is to make audio and video recordings easier to work with after a study session. Instead of manually going through long recordings, the system turns speech into text, adds precise timestamps, and helps separate speakers. That gives us a cleaner transcript that we use for review, coding, analysis, and documentation.
For research sessions, interviews, and conversation-based studies, this saves time and keeps the data organized while still letting the research team check and edit the transcript when needed.
Session-Based Interface Mirroring in OS-DPI
Video: Session-Based Interface Mirroring in OS-DPI
We developed a session-based iframe mirroring system for OS-DPI to support live research sessions with separate participant-facing and researcher-facing interfaces. The system used session-specific links to connect both views to a shared session, allowing the participant interface to load through an iframe while remaining synchronized with the researcher view. This approach reduced setup burden, minimized mismatches across devices, and supported more controlled testing of AAC and conversational-AI interfaces.
Through this work, OS-DPI became a coordinated research platform rather than a single-user interface. The shared-session structure synchronized interface state, cue changes, keyboard and pointer activity, and design updates across connected views, allowing researchers to monitor and adjust live sessions while participants interacted with the experimental interface.
Persona Development Tool
Developed by Jenna Bizovi, our Google Form persona tool allows research participants to independently create a detailed persona for use with DEAN.
Video Recording
In the ENGAGE, MOVE, and DEAN studies, a primary data acquisition tool was video. Our video setup always included a social camera focused on the head and torso of participants to capture gaze and embodied communication. We also placed a camera directly over the communication device being used by the augmented communicator.
Eye-Tracking Glasses
In the studies for the ENGAGE project, because a major indicator of engagement is mutual gaze, we wanted to carefully track and code gaze as part of our data stream in ELAN. We tested various eye-tracking glasses and settled on Pupil Labs Invisible Glasses. We also used the lens kit to provide correction for participants who required it.
These glasses track and record where the user is looking. The video is sent to a smartphone and uploaded to Pupil Cloud. We downloaded this video and synchronized it with our other video feeds, including the device camera and social camera, into ELAN for analysis.
OS-DPI Logging
A major technical milestone established during Project ENGAGE and maintained throughout Project Converse was the implementation of specialized logging within the OS-DPI platform, created in partnership with Project Open. These capabilities enabled us to capture user-interface activity and merge device selection logs directly into ELAN annotation files.
This infrastructure was critical for examining AAC-mediated communication as a unified event. It enabled the research team to analyze the fine-grained relationship between system performance and interpersonal interaction rather than evaluating device operation and social talk in isolation.
Scenario Tasks
Training Protocols and Materials
Conversational Tasks
We used four conversation tasks developed by CADL for Project Open.
Map Task
In the Map Task, the augmented communicator and speaking partner each have a map, but only the augmented communicator’s map shows the route. The augmented communicator gives directions, and the partner marks the route on their own map.
The task is asymmetric because the participants have different information, and some landmarks differ across the maps. Since they are not told which landmarks differ, they must identify and resolve these mismatches through interaction as they work toward the destination.
Shared Experience Task
In the Shared Experience task, participants take turns recounting a memory of an event they both experienced. Each participant receives separate directions: one is asked to begin telling the story, while the other is told to share their own recollection of the same event.
Unlike the Map Task, this task does not create a strong division of roles because both participants have access to the shared experience. This creates a more symmetric interaction in which both partners contribute to the telling, correction, and elaboration of the narrative.
Scenario Tasks
In Projects ENGAGE, MOVE, and DEAN, we used problem-solving and negotiation tasks to examine how augmented communicators use experimental AAC systems in more challenging interactional contexts. These tasks required participants to advocate for different ideas, negotiate a course of action, or resolve competing agendas.
To develop them, Katrina Fulcher-Rood and Pamela Mathy drew on published resources, input from Kailie Horowitz, a Buffalo-based actress with experience teaching improvisation workshops, and project brainstorming. This work produced a repository of more than 40 adaptable scenario tasks.
The task repository was coded for key features, including number of participants, participant motivations, sources of conflict, and other interactional demands. This allowed us to select or adapt tasks for specific studies.
For the DEAN field test, we created 20 scenarios organized as 10 pairs of parallel tasks for the 10-session trial. In each session, the augmented communicator used either their home AAC system or the DEAN probe to negotiate a course of action with a partner, with each participant assigned a different agenda.
Example Parallel Scenarios from the DEAN Field Test Study
Groomer: urgent appointment request
AS: You have an important event this weekend, such as family photos, a wedding, or a job interview, and your dog needs grooming beforehand. Your dog’s coat is getting matted and you are worried it will get worse if you wait. You want an appointment before the weekend and you want the groomer to make it work. Your dog is part of the family and must be in the family photos, period.
TS: You are the assistant manager at a dog grooming salon. You are fully booked, short-staffed, and anxious about overbooking because it creates complaints and affects safety and quality. Your goal is to make a clear decision, yes or no. If the answer is no, offer a realistic alternative, such as a waitlist, partial service like nails or brush-out, a different day or time, or a referral.
Vet clinic: urgent appointment request
TS: It is late on a Friday and your dog is sick with terrible diarrhea. You need a last-minute vet appointment so you get a prescription for treatment before the weekend. You dread the idea of cleaning up after your dog all weekend. You need help with getting an appointment and finding a solution.
AS: You are the receptionist. You are booked and must triage. Your goal is to reach a workable plan, such as an appointment, referral, or alternative.
Training Protocols and Materials
Before participating in pilot, bench, or field studies, participants received training matched to the complexity of the experimental AAC system they would use. This training was especially important because one of our major dependent measures was turn-transition time: the time between the end of the speaking partner’s turn and the start of the augmented speaker’s response.
To interpret this measure meaningfully, participants needed sufficient familiarity with the device to locate vocabulary, make selections, and operate the system efficiently. This applied to both experienced SGD users and participants taking on the role of an augmented communicator in experimental studies. Training, therefore, focused on interface layout, vocabulary location, selection accuracy, and visual-motor proficiency in both drill-based and contextual tasks. The amount of training varied depending on the complexity of the system.
ENGAGE Study Training
For the ENGAGE Pilot Study, we developed an Automated Communication Board Trainer that presented lists of increasingly long phrases for subjects to input as quickly as possible on the OS-DPI-implemented user display of the experimental device. The trainer automatically recorded reaction times as participants practiced locating words, letters, and control elements on the user interface as quickly as possible.
MOVE Training
In the MOVE pilot study, the participant taking the augmented-communicator role completed a multi-session training protocol before using the experimental MOVE vocabulary in conversation tasks. Initial training focused on learning the location of vocabulary items and building selection speed and motor familiarity, using the same Communication Board Trainer developed for the ENGAGE pilot study.
The participant then received instruction on the MOVE vocabulary layout and its communicative functions, including core words, agreement and disagreement, floor management, topic change, repair and attention, and QWERTY access. Later sessions used video examples, discussion, role-play, and targeted interface practice to build interactional use of MOVE.
Training emphasized making timely contributions, managing the floor, shifting topics, repairing breakdowns, gaining attention, disagreeing, questioning, and flexibly combining vocabulary areas. After each phase, the participant completed conversation tasks with a speaking partner to examine how these strategies were used in problem-solving and negotiation interactions.
DEAN Field Testing Training
For the DEAN field study, training was conducted remotely because DEAN could be installed and run on each participant’s own AAC device. Linda and Todd completed training sessions from their homes. During training, participants shared their screens so that their selections were observed in real time.
Training focused on helping participants learn the MOVE vocabulary as it was implemented within DEAN. The training was organized into two phases. Phase 1 focused on visual-motor learning and location memory. Participants practiced locating 45 top-level MOVE targets, with performance tracked using a spreadsheet that recorded item accuracy and response time across trials.
Phase 2 shifted from simple item location to functional use. In this phase, participants completed probes that required them to select MOVE items in response to brief communicative contexts. These data were also recorded in a spreadsheet to enable review of changes in accuracy, response time, and functional use of the vocabulary across training sessions.
