Although Apple, based on its internal research, claims that the Apple Watch (AW) ECG has a 98% sensitivity and a 99% specificity for detection of atrial fibrillation, doubts have been raised about its accuracy in the real world.
I have recently reported on Apple Watch’s inability to diagnose atrial fibrillation (AF) when the heart rate is >120 beats per minute. This inherent limitation means AW has a built-in reduced sensitivity (which was not present in the testing group.)
In a Research Letter published online Feb. 24th in Circulation, Dr. Marc Gillinov, reports on the accuracy of Apple Watch in a population of patients who were post cardiac surgery and therefore on cardiac telemetry with a high risk of going in and out of AF.
Rhythm assessments using the Apple Watch ECG were performed 3 times per day over 2 days on 50 patients. Comparison was made between the watch reading (Sinus rhythm, AF, or inconclusive) and an expert human interpretation of the PDF from the watch and simultaneously obtained telemetry rhythm strip.
The results were disappointing for the AW.
The AW4 notification correctly identified AF in 34 of 90 instances, yielding a sensitivity of 41%. Of 25 patients with at least 1 episode of AF, AF was identified in 19. Among patients in SR, none was designated as AF (ie, no false positives); however, rhythm was deemed inconclusive in 31% of patients, and there was no additional attempt to assess rhythm. Overall agreement between AW4 notification and telemetry was 61% (κ statistic = 0.33 [95% CI, 0.24–0.41]).
This confirms my prediction that AW would identify less than half of AF cases.
I have to believe that the 29 cases diagnosed as “inconclusive” were due to the AW AF inherent blinding limitation related to rapid heart rate. If we presume these would all have been correctly identified as AF (if the AW had not been hamstrung) then the sensitivity increases to 70%.
The authors of this article don’t seem to understand the difference between unreadable (meaning too much artifact to make a diagnosis) versus inconclusive (which Apple only uses when the AF is > 120 BPM.) They conclude by saying:
The unreadable (ie, inconclusive) rate reported in that study was 6% compared with 31% in this pilot study.
They have muddled together unreadable and inconclusive.
I do strongly agree with their final conclusions
Variations in sensitivity between these 2 studies suggest the need for further validation before this technology is adopted by the public for AF detection. Physicians should exercise caution before undertaking action based on electrocardiographic diagnoses generated by this wrist-worn monitor.
Indeed, any diagnosis from the Apple Watch itself should be confirmed by a cardiologist who is an expert at interpreting these single-lead ECG recordings.
One reason the skeptical cardiologist has been so enthusiastic about coronary calcium (CAC) scans is that I have found them to be highly reproducible and highly accurate.
Unlike most imaging tests in cardiology if we perform a CAC on the same individual in the CT scanner of hospital A and then repeat it within a few days in the CT scanner of hospital B we expect the scores to be nearly identical.
Also, unlike most other imaging tests we don’t expect false negatives or false positives. If the CAC score is zero there is no coronary calcification-high sensitivity. If the score is nonzero there is definitively calcium and therefore atherosclerotic plaque in the coronaries-high specificity.
This is because calcium as defined in the Agatson score is literally black and white-a pixel is either above or below the cut-off. Computer software automatically identifies on the scan. A reasonably trained CT tech should be able to identify the calcium that is residing in the coronary arteries based on his or her knowledge of the coronary anatomy as registered on CT slices. Using software the total Agatson score is calculated.
A physician reader (either cardiologist or radiologist) (who should have a very good understanding of the cardiac and coronary anatomy ) should review the CT techs work and verify accuracy.
A recent case report, however, has demonstrated that the above assumptions are not always true.
Franz Messerli, a pre-eminent researcher in hypertension and a cardiologist describes in fascinating detail a false-positive CAC scan he underwent in 2013. He was told he had a score of 804 putting him in a high risk category consistent with extensive plaque formation.
After consulting with cardiologist friends and colleagues he decided to put himself on a statin and aspirin despite having an excellent lipid profile.
Messerli assumed that the CAC score was not a false positive (although later in his article he indicates he had questioned the reading) writing:
“although one can always quibble with ST segments or wall motion abnormalities, on the CAC the evidence is rock-hard, you actually with your own eyes can see the white calcium specks! ‘Individuals with very high Agatston scores (over 1000) have a 20% chance of suffering a myocardial infarction or cardiac death within a year’—although I did not quite classify, this patient information coming from esteemed Harvard cardiology colleagues3 was hardly reassuring.
A more recent study found patients with extensive CAC (CAC≥1000) represent a unique, very high-risk phenotype with CVD mortality outcomes (0.80%/yr) commensurate with high-risk secondary prevention patients (0.77%/yr) from the FOURIER trial)
Six years after the diagnosis Messerli was at a Picasso exhibition, “leisurely ambling between his Blue and Pink Period “when he developed chest pain.
To further evaluate the chest pain he underwent a coronary CT angiogram and this demonstrated pristine and normal coronary arteries, totally devoid of calcium.
He did have a lot of mitral annular calcification (MAC). The CCTA images below show how close the MAC is to the left circumflex coronary artery (LCX).
The slice above shows how the MAC would appear on the CT scan designed to assess coronary calcium. It’s position is very close to that of the circumflex but an experienced reader/tech should have known this was not coronary calcification.
MAC is a very common finding on echocardiograms, especially in the elderly and it is likely that this error is not an isolated one.
Dr. Messerli writes
After relating these findings to the cardiologist who did the initial CAC, he indicated that most likely someone mistook mitral annular calcification as left circumflex calcium. This was hardly reassuring, since I specifically had asked that obvious question after receiving the initial CAC
Around the time I read Messerli’s case report I encountered a similar, albeit not as drastic case. A CAC scan showed a significant area of calcification near the left circumflex coronary artery which was scored as circumflex coronary calcification.
The pattern of this calcification is not consistent with the known path of the circumflex coronary in this case. When it was eliminated from the scoring the patient had a zero score. The difference between a nonzero score and a zero score is hugely significant but for patients with scores >100 such errors are less critical.
I have also encountered cases where extracardiac calcium mimics right coronary calcification.
There are some important take-home points from my and Dr. Messerli’s experience.
False positive CAC scans do occur. We don’t know the frequency. If the scans are not overread by a competent cardiologist or radiologist with extensive experience in cardiac CT these mistakes will be more common
When I asked Dr. Messerli about this problem he responded
I am afraid you are correct in that CAC scores are generated by techs and radiologists and cardiologist simply sign the report without verifying the data. Little doubt that MAC is most often missed.
2. Like other cardiac imaging tests (such as echocardiography) having an expert/experienced/meticulous tech and reader matters.
3. Dr. Messerli and I agree that a research project should be done to ascertain how often this happens and to evaluate the process of reading and reporting CAC.
4. Patients should look at the breakdown of the calcium in the CAC by coronary artery. Whereas it is not uncommon to see most of the calcium in the LAD it is rare to see a huge discrepancy in which the circumflex coronary artery score is very high and the LAD score zero. Such a finding should warrant a review of the scan to see if MAC was included in error.
N.B. Dr. Messerli’s report can be read for free and makes for entertaining reading.
I was very intrigued by two comments he made at the end:
“Had my CHD been diagnosed a decade earlier, guidelines might well have condemned me to taking beta-blockers for the reminder of my days.6 This, as Philip Roth taught us in ‘The Counterlife’, might have had rather unpleasant repercussions.7
Until recently I had never read anything by Philip Roth but when he died last year I read his Pulitzer Prize winning 1987 novel American Pastoral and liked it. Given this Roth reference involving beta-blockers I felt compelled to get my hands on “The Counterlife.” The book is a good read (much better IMHO than American Pastoral) and one of the main plot points relates to the side effects (see my post on feeling logy) a character suffers from a beta-blocker. Stimulated by a desire to be able to perform sexually if taken of the medication, the character undergoes coronary bypass surgery and dies.
2. “As stated by Mandrola and true in the present case, ‘given the (lucrative) downstream testing that often occurs when coronary calcium is found in asymptomatic people, the biggest winners from CAC screening may be the testers rather than the tested’.”
I feel the CAC in the right hands should not lead to (lucrative or inappropriate) downstream testing in the asymptomatic (see my discussion on this topic here.)
Apple claims that its Apple Watch can detect atrial fibrillation (AF) and appropriately notify the wearer when it suspects AF.
This claim comes with many caveats on their website:
Apparently it needs to record 5 instances of irregular heart beat characteristic of atrial fibrillation over at least 65 minutes before making the notification.
This feature utilizes the watch’s optical heart sensors, is available in Apple Watch Series 1 or later and has nothing to do with the Apple Watch 4 ECG recording capability which I described in detail in my prior post.
Failure To Detect AF
A patient of mine with known persistent AF informed me yesterday that she had gone into AF and remained in it for nearly 3 hours with heart rates over 100 beats per minute and had received no notification. She confirmed the atrial fibrillation with both AW4 recordings and AliveCor Kardia recordings while she was in it.
The watch faithfully recorded sustained heart rates up to 140 BPM but never alerted her of this even though the rate was consistently over her high heart rate trigger of 100 BPM.
The patient had set up the watch appropriately to receive notifications of an irregular rhythm.
Reviewing her tracings from both the AW4 and the Kardia this was easily diagnosed AF with a rapid ventricular response.
What does Apple tell us about the accuracy of the Apple Watch AF notification algorithm? All we know is the unpublished , non peer-reviewed data they themselves collected and presented to the FDA.
In a study of 226 participants aged 22 years or older who had received an AFib notification while wearing Apple Watch and subsequently wore an electrocardiogram (ECG) patch for approximately 1 week, 41.6% (94/226) had AFib detected by ECG patch. During concurrent wear of Apple Watch and an ECG patch, 57/226 participants received an AFib notification. Of those, 78.9% (45/57) showed concordant AFib on the ECG patch and 98.2 % (56/57) showed AFib and other clinically relevant arrhythmias. These results demonstrate that, while in the majority of cases the notification will accurately represent the presence of AFib, in some instances, a notification may indicate the presence of an arrhythmia other than AFib. No serious device adverse effects were observed
This tells us that about 80% of notifications are likely to be Afib whereas 20% will not be Afib. It is unclear what the “other clinically relevant arrhythmias” might be. If I had to guess I would suspect PVCS or PACS which are usually benign.
If 20% of the estimated 10 million Apple Watch wearers are getting false positive notifications of afib that means 2 million calls to doctor or visits to ERs that are not justified. This could be a huge waste of resources.
Thus the specificity of the AF notification is 80%. The other important parameter is the sensivitiy. Of the cases of AF that last >65 minutes how many are detected by the app?
Apple doesn’t seem to have any data on that but this obvious case of rapid AF lasting for 3 hours does not give me much confidence in their AF detection algorithms.
They do have a lot of CYA statements indicating you should not rely on this for detection of AF:
It is not intended to provide a notification on every episode of irregular rhythm suggestive of AFib and the absence of a notification is not intended to indicate no disease process is present; rather the feature is intended to opportunistically surface a notification of possible AFib when sufficient data are available for analysis. These data are only captured when the user is still. Along with the user’s risk factors, the feature can be used to supplement the decision for AFib screening. The feature is not intended to replace traditional methods of diagnosis or treatment.
My patient took her iPhone and Apple Watch into her local Apple store to find out why her AF was not detected. She was told by an Apple employee that the Watch does not detect AF but will only notify her if her heart rate is extremely low or high. I had asked her to record what they told her about the problem.
As I’ve written previously (see here) the Apple Watch comes with excessive hype and minimal proof of its accuracy. I’m sure we are going to hear lots of stories about AF being detected by the Watch but we need some published, peer-reviewed data and we need to be very circumspect before embracing it as a reliable AF monitor.
The skeptical cardiologist has been testing the comparative accuracy of two hand-held mobile ECG devices in his office over the last month. I’ve written extensively about my experience with the AliveCor/Kardia (ACK) device here and here. Most recently I described my experience with the Afib Alert (AA) device here.
Over several days I had my office patients utilize both devices to record their cardiac rhythm and I compared the device diagnosis to the patient’s true cardiac rhythm.
In 14 patients both devices correctly identified normal sinus rhythm. AFA does this by displaying a green check mark , ACK by displaying the actual recording on a smartphone screen along with the word Normal.
The AFA ECG can subsequently uploaded via USB connection to a PC and reviewed in PDF format. The ACK PDF can be viewed instantaneously and saved or emailed as PDF.
Normal by AFA/Unreadable or Unclassified by AliveCor
In 5 patients in normal rhythm (NSR) , AFA correctly identified the rhythm but ACK was either unreadable (3) or unclassified (2). In the not infrequent case of a poor ACK tracing I will spend extra time adjusting the patient’s hand position on the electrodes or stabilizing the hands. With AFA this is rarely necessary.
In this 70 year old man the AFA device recording was very good and the device immediately identified the rhythm as normal.
ACK recording was good quality but its algorithm could not classify the rhythm.
A 68 year old man who had had bypass surgery and aortic valve replacement had a very good quality AFA recording with correct classification as NSR
AliveCor/Kardia recordings on the same patient despite considerable and prolonged efforts to improve the recording were poor and were classified as “unreadable”
There were 3 cases were AFA diagnosed atrial fibrillation (AF) and the rhythm was not AF. These are considered false positives and can lead to unncessary concern when the device is being used by patients at home. In 2 of these ACK was unreadable or unclassified and in one ACK also diagnosed AF.
A 90 year old woman with right bundle branch block (RBBBin NSR was classified by AFA as being in AF.
The ACK algorithm is clearly more conservative than AA. The ACK manual states:
If you have been diagnosed with a condition that affects the shape of your EKG (e.g., intraventricular conduction delay, left or right bundle branch block,Wolff-Parkinson-White Syndrome, etc.), experience a large number of premature ventricular or atrial contractions (PVC and PAC), are experiencing an arrhythmia, or took a poor quality recording it is unlikely that you will be notified that your EKG is normal.
One man’s rhythm confounded both AFA and AC. This gentleman has had atrial flutter in the past and records at home his rhythm daily using his own AliveCor device which he uses in conjunction with an iPad.
During our office visits we review the recordings he has made. He was quite bothered by the fact that he had several that were identified by Alivecor as AF but in fact were normal.
A recording he made on May 2nd at 845 pm was read as unclassified but with a heart rate of 149 BPM. The rhythm is actually atrial flutter with 2:1 block.
Sure enough, when I recorded his rhythm with ACK although NSR (with APCS) it was read as unclassified
AFA classified Lawrence’s rhythm as AF when it was in fact normal sinus with APCs.
One patient a 50 year old woman who has a chronic sinus tachycardia and typically has a heart rate in the 130s, both devices failed.
We could have anticipated that AC would make her unclassified due to a HR over 100 worse than unclassified the tracing obtained on her by AC (on the right)was terrible and unreadable until the last few seconds. On the other hand the AFA tracing was rock solid throughout and clearly shows p waves and a regular tachycardia. For unclear reasons, however the AFA device diagnosed this as AF.
Accuracy in Patients In Atrial Fibrillation
In 2/4 patients with AF, both devices correctly classified the rhythm..
In one patient AFA correctly diagnosed AF whereas ACK called it unclassified.
This patient was in afib with HR over 100. AFA correctly identified it whereas ACK called in unclassified. The AC was noisy in the beginning but towards the end one can clearly diagnose AF
In one 90 year old man AFA could not make the diagnosis (yellow)
ACK correctly identified the rhythm as AF
One patient who I had recently cardioverted from AF was the only false positive ACK. AliveCor tracing is poor quality and was called AF whereas AFA correctly identified NSR>
The sensitivity of both devices for detecting atrial fibrillation was 75%.
The specificity of AFA was 86% and that of ACK was 88%.
ACK was unreadable or unclassified 5/26 times or 19% of the time.
The sensitivity and specificity I’m reporting is less than reported in other studies but I think it represents more real world experience with these types of devices.
In a head to head comparison of AFA and ACK mobile ECG devices I found
-Recordings using AfibAlert are usually superior in quality to AliveCor tracings with a minimum of need for adjustment of hand position and instruction.
-This superiority of ease of use and quality mean almost all AfibAlert tracings are interpreted whereas 19% of AliveCor tracings are either unclassified or unreadable.
-Sensitivity is similar. Both devices are highly likely to properly detect and identify atrial fibrillation when it occurs.
-AliveCor specificity is superior to AfibAlert. This means less cases that are not AF will be classified as AF by AliveCor compared to AfibAlert. This is due to a more conservative algorithm in AliveCor which rejects wide QRS complexes, frequent extra-systoles.
Both companies are actively tweaking their algorithms and software to improve real world accuracy and improve user experience but what I report reflects what a patient at home or a physician in office can reasonably expect from these devices right now.