Every year after the hype from CSUN starts to die down, my esteemed colleague from U. Bristol in the UK, Will Pearson writes an article that goes well beyond any of the existing technologies and dives deeply into the research side of vision and vision related technology. Recently, Will has focused his efforts on the psychology of attention and the associated neuromechanics.
For quite a while now, I’ve written about multiple data streams and simultaneous information delivery as the next generation interfaces for people with vision impairments. I’ve used a lot of examples from audio and video games and compared them to existing screen readers and other products designed for blind people. I look at these advances from a fairly practical usability and engineering point of view. Will studies the underlying science, cognitive psychology, learning theory and other ways the human brain actually works that make my implementation ideas possible and practical.
Will Pearson and I often collaborate via email and in telephone calls discussing our latest ideas. I have taken a lot of his psychological concepts and tried to fit them into an engineering model that I can use to build software. I hope that some of my ideas that we’ve discussed have had value to his work as well.
People who read Blind Confidential with some regularity have probably noticed that Will makes frequent comments here. His writing style falls into the category of dense academic prose that most often appears in scientific journals. Thus, those of you who don’t like reading serious scientific articles should probably hang onto your seats as you’re in for a pretty rough ride. If, however, like me you fall into the category of uber-geek, please enjoy the following article.
An Antidote to CSUN
By: Will Pearson
Now that all the hype of CSUN is behind us, I thought it time to begin to explore the more serious questions, the sort that are rarely touched on at CSUN. The first question I felt worthy of an attempt at an answer is, whether using a screen reader can ever be as efficient as using sight? There’s been plenty of speculation on the topic, usually resulting in the answer that if waived their magic wand using a screen reader would be as efficient as sight. However, after spending several years considering this, and other human computer interaction issues related to screen reader use, I take a different view. My justification, whilst not exhaustive, is below.
The first area where screen readers appear to fall short is in their ability to communicate semantics. Communication is all about communicating thoughts, concepts, states, etc., and communication between an interface for a piece of software and a user is no different in this respect. The main problem is that screen readers, through their use of speech and Braille, both of which are serialised forms of communication, use less physical variables to encode semantic content than sight does. There’s roughly six variables that can be used to encode semantic content, and these are:* The position of something on the X, Y and Z axes* The position of something in time* The frequency of the physical wave, represented by things like color, pitch, etc.* The amplitude of the physical wave, or how strong it isUsing a computer with sight typically takes advantage of five of these variables, whilst screen readers typically only use two. So, it will take longer to communicate the same semantic content using a screen reader than it will sight. To some extent this has supporting evidence from psychological studies in which the listening and reading speeds of the same person were compared. These studies found that the same individual could read something faster than they could listen to it. There are differences between individuals, which can account for why some screen reader users can listen to things faster than some people can read things, but within the same individual the evidence seems to indicate that listening to things is slower.
This serialisation of semantic content, brought about by the smaller capacity of speech, also has implications for memory utilisation and cognitive workload. Studies involving Functional Magnetic Resonance Imaging of the cortex have shown greater activity in the cortical regions of the brain when listening to speech than when reading something. Not only is there activity on the left side of the cortex, in regions such as Brocha’s Area and Wernicke’s Area, which is present for both reading and listening, but listening to speech also produces activity in the right side of the cortex, which is thought to be related to contextual priming. In addition to the extra neurological activity associated with language processing, there is also a higher demand on short term working memory. As speech is temporary, one moment it is there, the next it is not, someone listening to speech has to remember more than someone reading something. It is not so easy to move back to a previously listened to word or sentence than it is to move back to a previously read word or sentence. Navigating by listening often involves listening to words, deciding whether they are the ones that are saught after, and if not, navigating some more and repeating the process.
Another consideration are the distinctions between programatic focus, the mechanism used to shift attention with a screen reader, and visual attention. Screen readers utilise a mechanism of programatic focus to shift the user’s attention between user interface elements. This means that a user’s attention is only focused on a single point at once, something further compounded by a screen reader’s use of serialised output. Whilst visual attention is usually focused on a single object, it can shrink and grow, similar to a zoom lens, to encompass more or less of an object. This ability to shift attention from a word to a paragraph and then onto the entire document provides a number of benefits for people reading documents. The most obvious benefit is the ability to not only navigate by word or line, but to navigate around the document based on more granular objects, such as paragraphs, tables, images, etc. Whilst similar functionality is available in some screen readers for a limited set of scenarios, this functionality is not as flexible as the visual mechanism used to shift attention. The visual mechanism can group granular objects together, such as a table proceeded by a diagram, and can jump to those with very little requirement for processing. In addition to granular navigation, attention can also be shifted based on physical features, such as color or location, which requires just the elements with those physical features to be searched, as suggested by Treisman’s Feature Integration Theory. As far as I am aware, no equivalent functionality to this exists in a screen reader. One key difference between programatic and visual attention is that programatic attention can only be moved to fixed points, whilst visual attention can be moved to any point or object. The final difference worth mentioning is that attention is not just limited to a single point in the visual field. Whilst there are overt, indogenous, mechanisms to control visual attention through moving the point of fixation, attention can also be focused in the periphery of the visual field, through covert, indogenous, mechanisms. This is a useful point, as it means that sighted people can detect changes in the state of something that occur away from their current point of fixation without the cognitive work involved in moving the point of fixation
So, I, for one, am beginning to form the opinion that screen readers are not physically capable of delivering the same levels of efficiency as sight can. This isn’t to say that blind people cannot gain the same level of efficiency, just that it looks likely that they are unable to do this using a screen reader. What is more, is that this is not the fault of a particular application or platform vendor, as is often claimed, but more a problem with the core concept of a screen reader, a concept that requires everything to be serialised.