Study: COVID Symptoms, Symptom Clusters, and Predictors for Becoming a Long-Hauler: Looking for Clarity in the Haze of the Pandemic

By: Yong Huang, Melissa D. Pinto, Jessica L. Borelli, Milad Asgari Mehrabadi, Heather Abrihim, Nikil Dutt, Natalie Lambert, Erika L. Nurmi, Rana Chakraborty, Amir M. Rahmani, Charles A. Downs

March 05, 2021


Emerging data suggest that the effects of infection with SARS-CoV-2 are far reaching extending beyond those with severe acute disease. Specifically, the presence of persistent symptoms after apparent resolution from COVID-19 have frequently been reported throughout the pandemic by individuals labeled as “long-haulers”. 

The purpose of this study was to assess for symptoms at days 0-10 and 61+ among subjects with PCR-confirmed SARS-CoV-2 infection. The University of California COvid Research Data Set (UC CORDS) was used to identify 1407 records that met inclusion criteria. Symptoms attributable to COVID-19 were extracted from the electronic health record. Symptoms reported over the previous year prior to COVID-19 were excluded, using nonnegative matrix factorization (NMF) followed by graph lasso to assess relationships between symptoms. A model was developed predictive for becoming a long-hauler based on symptoms. 27% reported persistent symptoms after 60 days. Women were more likely to become long-haulers, and all age groups were represented with those aged 50 ± 20 years comprising 72% of cases. Presenting symptoms included palpitations, chronic rhinitis, dysgeusia, chills, insomnia, hyperhidrosis, anxiety, sore throat, and headache among others. We identified 5 symptom clusters at day 61+: chest pain-cough, dyspnea-cough, anxiety-tachycardia, abdominal pain-nausea, and low back pain-joint pain. Long-haulers represent a very significant public health concern, and there are no guidelines to address their diagnosis and management. Additional studies are urgently needed that focus on the physical, mental, and emotional impact of long-term COVID-19 survivors who become long-haulers.


In the United States, over 28 million people have been infected with SARS-CoV-2, the virus responsible for COVID-19, and the cumulative hospitalization rate has exceeded 1300 persons per 100,000 since early 2020. Hospitalized patients account for 1% of COVID-19 patients, yet most research to-date has focused on in-patients with severe disease. However, very little is known about the medium-and long-term consequences of COVID-19 among non-hospitalized individuals, although emerging data suggest a significant proportion of these subjects experience persistent symptoms associated with antecedent SARS-CoV-2 infection. Those with persistent symptoms have been labeled “long-haulers” or persons with long COVID-19. Recent estimates suggest that ∼10% of hospitalized patients go on to become long-haulers. However, the body of evidence regarding long-haulers, particularly among the 99% of non-hospitalized cases, is nascent.

Late sequelae following an infectious disease is not uncommon. However, it is unclear whether clinical manifestations reflect primary organ involvement during an acute infection or if long-term signs and symptoms are promoted by aberrant inflammatory immune response. Understanding the late sequelae of SARS-CoV-2 infection is limited due to small sample sizes and a preponderance of studies that have focused on hospitalized survivors, with very limited data at the population level. Further studies examining long-term outcomes in subjects with “milder” infection are crucial to understanding both the pathophysiology and the public health impact of COVID-19. In addition, developing an understanding of host factors that predict long-hauler status as well as potential association with symptom clusters will be pivotal to the development of evidence-based management guidelines.

In the current study, we utilized electronic health records (EHR) from community dwelling individuals (N=1407) with confirmed SARS-CoV-2 infection (via PCR) to determine symptoms and symptom clusters. Specifically, we evaluated symptoms at presentation (days 0-10 following a COVID-19 diagnosis) and at days 61+. We defined long-haulers as persons with persistent symptoms at day 61+ (27%) and evaluated if early symptoms or non-modifiable factors (age, ethnicity) could predict likelihood of persistent symptoms at day 61+ (e.g. long-hauler) and/or assignment within any given symptom cluster.


Features and Symptoms Among Community Dwelling Individuals with COVID-19: Days 0-10 and 61+

Table 1 shows sample distribution related to age, ethnicity, and sex. Figure 1 shows distribution of individuals reporting symptoms at days 0-10; approximately 68% of the total group exhibited symptoms, with 32% being asymptomatic. Prevalent symptoms during this time include (in descending order) dyspnea, cough, fever, chest pain, diarrhea, anxiety, and fatigue. Using NMF five symptom clusters with the co-occurrence of symptoms were identified. Symptom network analysis was used to identify prominent symptoms (larger node equates to greater prominence) and the strength of their association with other symptoms, wherein the darker the line connecting nodes indicates a stronger relationship.


The current study provides much needed insight into early factors predisposing individuals for becoming long-haulers. These novel findings warrant additional investigations, discussion, and context within current knowledge about long-haulers. In reviewing our findings, we believe there are three key take-home points from our analyses.

The UC CORDS data set provides both patient-reported and clinician documented symptoms from SARS-CoV-2 infected patients. These symptoms are reported and recorded in real time which minimizes retrospective recall that has been used in the limited studies to date. A few other important strengths from using the UC CORDS data set is that we exclude symptoms reported prior to SARS-CoV-2 infection to increase confidence in symptoms being attributable to becoming a long-hauler. The use of the data set allowed for a broad swath of symptoms, rather than being limited to a narrowly focused checklist of symptoms, which allows for a more sophisticated understanding of symptoms among long-haulers.

First, our observations suggest a developing picture of long-haulers potentially reflecting that Caucasian race, female sex, and normal BMI as common features specific to a sub-set of long-haulers. Although similar descriptions have been provided in other investigations and the lay media, further corroboration is warranted. We observed a near normal distribution of age among long-haulers, including those under the age of 18—with the mean age at 9.29 years. Although our study supported a potential association with female sex and higher likelihood of becoming a long-hauler, race appeared to be less predictive for both Caucasian and Hispanic ethnicity.

There has been conflicting information regarding whether asymptomatic individuals go on to become long-haulers, and roughly 32% of those reporting symptoms at day 61+ in our study were initially asymptomatic at the time of SARS-CoV-2 testing. Age distribution of all SARS-CoV-2 infected individuals at day 0-11 very closely mimicked that of the long-haulers, suggesting the latter group are distributed across all age groups with persons ages 50-59 range (± 20 years) representing more than 72% of the long-hauler population.

Secondly, the symptom experience among those who become long-haulers changes over time. Data from multiple studies converge to illustrate that many hospitalized and non-hospitalized survivors of COVID-19 experience persistent symptoms (1016). The reported incidence of persistent symptoms varies; however, in the current study we report that 27% of community dwellers reported symptoms after 60 days. Some of the variability in symptom reporting and symptom association with long-haulers may be due to limitations inherent in rapid screening questionnaires in as much as these questionnaires inquire about symptoms that predominantly impact those with severe disease. Also, questionnaires may fail to inquire about emerging symptoms such as cognitive dysfunction (including “brain fog”), limiting the ability to accurately document such symptoms. Asymptomatic individuals may be less often intensely monitored due to an inherent notion of low risk for severe acute disease; however, this is problematic as asymptomatic individuals account for 32% of the long-haulers observed in this study. The symptom clusters observed among long-haulers vary compared to those at initial presentation. The evolution of these clusters may provide insight into the etiology of long-haulers in which elucidating sites of evolving tissue damage, and alterations in innate and adaptive immune inflammatory pathways might provide clarity in understanding the underlying pathophysiology.

In October 2020, the Tony Blair Institute for Global Change identified key characteristics among long-haulers, specifically that women appear to be at greater risk and those who are of working age (mean of age 45). Our data align with these observations. Therefore, to our third key point, we observed that all ethnicities were affected as well as individuals who were initially asymptomatic. However, our use of ethnicity is limited to broad groups and lacks needed specificity, a limitation imposed by how data are recorded in the EHR. We therefore also assessed the most recently recorded BMI among long-haulers and those who had recovered from COVID-19 (Supplemental Figure 1). Mean BMI among long-haulers (by age group) ranged from 26 to 33, this may be due to limitations in our inclusion criteria (e.g. 5-year history with UC). Larger population-based studies will be needed to confirm and expand upon these observations. Undertaking detailed immune-profiling through emerging technologies such as the -omics platforms may identify key host phenotypes associated with the symptom clusters that we have described. We hope this article will prompt the development and implementation of longitudinal prospective studies that garner patient-generated reports of symptoms, rather than patient responses to questions generated by researchers — this latter approach inherently constrained the answers we obtained. With such a new phenomenon, an ethnographic approach that focuses on understanding patients’ experiences would add an important lens to our analyses.


Data are emerging to suggest that infection with SARS-CoV-2 may lead to prolonged and persistent symptoms. These long-term consequences of becoming a long-hauler are unclear, and further research is urgently needed to corroborate our findings. These findings include identifying a cohort of long-haulers with non-modifiable risk factors, which may have predicted the likelihood of persistent symptoms and/or assignment within given symptom clusters. Further research is needed to understand the underlying pathophysiology including host phenotypes associated with aberrant innate and adaptive immune responses following SARS-CoV-2 infection.