Stat2012 Assignment Sheet

Evidence for contribution of common genetic variants within chromosome 8p21.2-8p21.1 to restricted and repetitive behaviors in autism spectrum disorders

Department of Cellular and Genetic Medicine, School of Basic Medical Sciences, Fudan University, 130Dong’an Road, Shanghai, 200032 China

JohnsHopkins University, Baltimore, MD USA

Unit on Statistical Genomics, Intramural Research Program, National Institute of Mental Health, National Institute of Health, Bethesda, MD USA

Yu Tao, Email: nc.ude.naduf@23000701121.

Contributor Information.

Corresponding author.

Author information ►Article notes ►Copyright and License information ►

Received 2015 Jan 8; Accepted 2016 Feb 15.

Copyright © Tao et al. 2016

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

This article has been cited by other articles in PMC.



Restricted and Repetitive Behaviors (RRB), one of the core symptom categories for Autism Spectrum Disorders (ASD), comprises heterogeneous groups of behaviors. Previous research indicates that there are two or more factors (subcategories) within the RRB domain. In an effort to identify common variants associated with RRB, we have carried out a genome-wide association study (GWAS) using the Autism Genetic Resource Exchange (AGRE) dataset (n = 1,335, all ASD probands of European ancestry) for each identified RRB subcategory, while allowing for comparisons of associated single nucleotide polymorphisms (SNPs) with associated SNPs in the same set of probands analyzed using all the RRB subcategories as phenotypes in a multivariate linear mixed model. The top ranked SNPs were then explored in an independent dataset.


Using principal component analysis of item scores obtained from Autism Diagnostic Interview-Revised (ADI-R), two distinct subcategories within Restricted and Repetitive Behaviors were identified: Repetitive Sensory Motor (RSM) and Insistence on Sameness (IS). Quantitative RSM and IS scores were subsequently used as phenotypes in a GWAS using the AGRE ASD cohort. Although no associated SNPs with genome-wide significance (P < 5.0E-08) were detected when RSM or IS were analyzed independently, three SNPs approached genome-wide significance when RSM and IS were considered together using multivariate association analysis. These included the top IS-associated SNP, rs62503729 (P-value = 6.48E-08), which is located within chromosome 8p21.2-8p21.1, a locus previously linked to schizophrenia. Notably, all of the most significantly associated SNPs are located in close proximity to STMN4 and PTK2B, genes previously shown to function in neuron development. In addition, several of the top-ranked SNPs showed correlations with STMN4 mRNA expression in adult CEU (Caucasian and European descent) human prefrontal cortex. However, the association signals within chromosome 8p21.2-8p21.1 failed to replicate in an independent sample of 2,588 ASD probands; the insufficient sample size and between-study heterogeneity are possible explanations for the non-replication.


Our analysis indicates that RRB in ASD can be represented by two distinct subcategories: RSM and IS. Subsequent univariate and multivariate genome-wide association studies of these RRB subcategories enabled the detection of associated SNPs at 8p21.2-8p21.1. Although these results did not replicate in an independent ASD dataset, genomic features of this region and pathway analysis suggest that common variants in 8p21.2-8p21.1 may contribute to RRB, particularly IS. Together, these observations warrant future studies to elucidate the possible contributions of common variants in 8p21.2-8p21.1 to the etiology of RSM and IS in ASD.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-016-2475-y) contains supplementary material, which is available to authorized users.

Keywords: Restricted and Repetitive Behaviors (RRB), Genome-Wide Association Study, Autism Spectrum Disorders (ASD), Autism Diagnostic Interview-Revised (ADI-R)


Autism Spectrum Disorder (ASD) is characterized by impaired reciprocal social interactions, delayed or aberrant communication, and the presence of restricted and repetitive behaviors, frequently with restricted interests [1]. These disabilities often confer significant lifelong burdens on individuals with ASD. This fact, together with the high ASD prevalence in the general population, makes ASD a major challenge for public health systems. [2–5] Based on heritability estimates as high as 70- 90 % in twin and family studies [6], great effort has been devoted to elucidating the genetic mechanism of ASD. However, it has been difficult to identify any individual genetic factors that confer even moderate risk [7, 8].

Genome-wide association studies (GWAS) have implicated the region on chromosome 5p14.1 between CDH9 and CDH10 as the first potential common genetic risk factor for ASD in Caucasian populations [9, 10]. Replication in independent GWAS, however, has frequently not been achieved for many candidate loci for ASD [11–14]. Phenotype and genetic heterogeneity between patients are conjectured to greatly reduce the power of overall genome-wide case-control studies in ASD, and is a likely explanation for the lack of replication and much of the ‘missing heritability’ in this complex disease [15]. Various attempts have been made to reduce heterogeneity in large-scale genetic studies of ASD. One proposed approach to increase statistical power to detect pathogenic loci is to design genetic association studies focusing on ASD sub-phenotypes [16–19].

Restricted and repetitive behaviors (RRB) are a core symptom of ASD [20]. Previous studies have shown that RRB have an underlying genetic component and may be influenced by genes independent of those associated with the social or communication deficits [21–23]. Moreover, Autism Diagnostic Interview-Revised (ADI-R), a gold-standard diagnostic tool for ASD [24], provides widely-accepted quantitative measures for RRB [25, 26], making it a promising sub-phenotype for association studies. Since RRB comprises heterogeneous groups of behaviors [27, 28], research during the last decade has used factor-analysis to examine the structure of RRB using different subsets of ADI-R items and subpopulations of ASD individuals that differ in symptom severity and/or ethnicity [27, 29–32]. Remarkably, in spite of their methodological differences, many of these analyses converge on a two-factor solution for RRB comprising ‘repetitive sensory-motor’ (RSM) and ‘insistence on sameness’ (IS). The RSM subcategory quantifies motor mannerisms, sensory seeking behaviors, and the repetitive use of objects, whereas the IS subcategory quantifies compulsions, rituals and difficulties with changes in routine [30].

IS and RSM were found to be differentially related to other ASD variables. Specifically, high correlations were found between RSM, but not IS, with IQ, additional adaptive behaviors, and age at first words and phrases [29, 30, 33], suggesting that, compared to IS, RSM may be more correlated with ASD severity [33]. Studies also indicated that the IS subcategory might be under stronger additive genetic control than the RSM subcategory. Whereas significant familial aggregation of the IS subcategory has been consistently reported [31, 32], no significant concordance for familial aggregation has been reported for RSM [25, 27, 31, 34].

Behavioral subcategories that differ in behavioral correlates and familiality are of particular interest to researchers investigating the genetic components that underlie ASD sub-phenotypes. In a recent genome-wide linkage analysis [25], RSM and IS subcategories were linked to various chromosomal regions that only partially overlapped regions previously identified using ASD diagnosis as the phenotype.

In the current work, we explored the underlying structure of RRB using two independent, publicly available ASD datasets. In an effort to identify SNP markers, candidate genes and biological pathways associated with RRB, the empirically derived RRB subcategories were then used as quantitative traits for GWAS. The observation that both univariate and multivariate linear mixed models identified associated SNPs within 8p21.2-21.1 in the discovery dataset, provides the first evidence that genetic variation in this region influences RRB phenotypes in ASD. Further studies are needed to confirm the association signals within chromosome 8p21.2-8p21.1, since replication was not obtained in an independent sample of 2,588 ASD probands, possibly due to insufficient sample size and between-study heterogeneity.


Ethics statement

This study was approved by the ethics committee of the School of Basic Medical, Fudan University, China (IRB#2010CB529601). All the genetic data and phenotype data used is previously published and publicly available. Written informed consent was previously obtained from all individuals and procedures had approval from institutional review boards from all the institutions involved in recruitment and research, following national and international ethical and legal regulations and the principles of the Declaration of Helsinki.

Dataset demographics

The discovery dataset comprised individuals in the Autism Genetic Resource Exchange family-based dataset (AGRE: [35]. AGRE has obtained informed consent from all individuals listed in their pedigree catalogue. Individuals with ASD in the AGRE cohort were diagnosed using both the Autism Diagnostic Interview-Revised (ADI-R) [24] and Autism Diagnostic Observation Schedule (ADOS) [36], widely considered to be the gold-standard diagnostic instruments for ASD. Individuals with possible non-idiopathic ASD (e.g., patients with significant chromosomal abnormalities, premature birth, or comorbid disorders) were excluded. All subjects were genotyped using the Illumina HumanHap550 BeadChip. Genotyping details and other important information have been previously described [9]. A “cleaned” version of the raw AGRE genetic data, designated CHOP.clean100121, was downloaded from the AGRE website (4,327 subjects). Following the method described in the supplement section of the study by Wang et al. [9], population structure was examined based on the first two principal components obtained by multidimensional scaling (MDS) of a matrix of pairwise IBS (Identical By State) values between these individuals. Individuals of European ancestry were selected based on principal component 1 scores less than 0.0, and principal component 2 scores between -0.02 and 0.02. A total of 806 ASD families (3,455 individuals) were inferred to have European ancestry using the above procedure (Additional file 1). Because ADI-R record were available only for ASD probands rather than for all pedigree members, the final discovery dataset comprised 1,335 probands, ranging in age from 1.8 to 44 years (mean = 8.00 years, SD = 4.87), and were predominately male (78.7 %). The sample size of each computer-scored diagnostic group defined by AGRE was: Autism 1,152 (86.3 %), Not Quite Autism (NQA) 57 (4.27 %), and Broad Spectrum (BS) 126 (9.44 %). The specific criteria for these classifications are given on the AGRE website. Detailed information for all individuals can be found in Additional file 2.

The dataset used for replication comprised individuals in Simons Simplex Collection (SSC, version 15), a genetic study limited to families with one child with ASD (the proband). Previous reports have described the SSC data collection process, as well as the extensive phenotypic data available [37]. Informed consent was obtained at each data collection site included in the SSC. Our group obtained phenotype and genotype data for 2,591 ASD families, from which 2,588 probands with ADI-R records were selected for analysis, These probands ranged in age from 1 to 108 years (mean = 21.46 years, SD = 13.96), and were predominately male (86.67 %). The sample size of each diagnosed status defined by ADI-R was Autism: 2,346 (90.65 %), and Autism Spectrum Disorder (ASD): 242 (9.35 %). Detailed information for all the individuals can be found in Additional file 3.

Diagnostic instruments

The Autism Diagnostic Interview-Revised (ADI-R) instrument is a standardized parent interview designed to assess the presence and severity of symptoms based on the DSM-IV criteria for ASD [24]. Items designed for interviews fall within three diagnostic categories: i) social, ii) communication and iii) restricted and repetitive behaviors. Two scores are given for most items: a ‘current’ score, which assesses behavior during the past 3 months, and an ‘ever’ score, which assesses behavior in early childhood or at its greatest severity. We used the ‘current’ score in each item to avoid potential retrospective bias that could result from using the ‘ever’ score. The full range of each item scores (0–3) was used to provide maximal information of severity. Scores of 7 (“definite abnormality in the general area of the coding, but not of the type specified”), 8 (“not applicable”), and 9 (“not known or not asked”) given under certain circumstances were all converted to 0, according to the algorithm listed in ADI-R [38].

Genotype imputation

Imputation of genotypes for autosomal SNPs was performed using IMPUTE version 2.2.2 [39, 40]. The reference panel used was the 1000 Genomes Phase I integrated haplotypes, which were produced using SHAPEIT2 [41] and released in June 2014 ( Imputed SNPs with low imputation quality (R2 or info scores < 0.3) or minor allele frequencies (MAF) < 5 % were excluded.

Statistical analysis

PCA analysis of RRB items

Factor analysis was carried out using Principal Components Analysis (PCA) with varimax rotation on 11 RRB items from the ADI-R using R [42, 43]. These items were previously included in the RRB subdomains, RSM and IS by S. L. Bishop and colleagues using exploratory factor analysis [27]. Similar to previous factor analyses of the ADI-R [30, 34], we employed a cutoff of 0.30 for the inclusion of an item in a factor. Correlation analyses were conducted to examine relationships between RRB subcategories.

Assessment of familiality of RRB subcategories

To investigate potential familial relationships in the empirically derived subcategories, intraclass correlations (ICCs) between sibling pairs with any ASD diagnosis (i.e., Autism, NQA, or BS) from the multiplex families were calculated (monozygotic twins were excluded) [30]. Affected sib pairs of each multiplex family were included in ICCs calculation (N = 200 from 100 families). This analysis was only done using the AGRE dataset, since SSC was limited to families with only one ASD child.

Genome-wide association analyses

The association of SNPs with the RRB subcategories was measured using a novel Genome-wide Efficient Mixed-Model Association (GEMMA) approach developed by Zhou and Stephens [44]. Briefly, GEMMA fits univariate linear mixed models for associations with single phenotypes or multivariate linear mixed models for simultaneous associations with multiple phenotypes, while controlling for sample relatedness and potential population stratification. (For addition details see [44–46]). GEMMA was downloaded from Raw item scores from ADI-R score sheets of ASD probands were summed for each subcategory of RRB identified by PCA analysis. Sex and age at ADI-R standardized residuals of the summed scores of each subcategory were calculated using multivariate regression. These residuals were normalized following Tukey’s formula using SPSS and then used as phenotypes in genome-wide association analysis. Autosomal chromosome association results were retained [47]. First, associations between SNPs and each subcategory were tested using the GEMMA program based on a univariate linear mixed model, while applying a correction for sample structure (population stratification and hidden relatedness) through a pairwise relatedness matrix derived from SNP genotypes. Second, the GEMMA program was used to investigate associations between SNPs and RRB sub-categories in a multivariate analyses model to estimate the robustness of the associations. The use of multivariate methods has been recommended, because multivariate analyses may increase power not only to detect genetic variants that affect only one of the multiple correlated phenotypes, but also pleiotropic genetic variants [44–46].

Following association analysis, statistical evidence for association was evaluated by carrying out genome-wide association analysis for 1000 permutations of phenotypes. To ease the computational burden, these analyses were performed for genotyped SNPs only. To estimate genomic inflation factors for all the distributions of P-values, lambda genomic control (lambdaGC) values were calculated as the ratio of the median of the empirically observed distributions of the test statistic to the expected median. The empirical p-value for the original association was calculated as the proportion of permutation replicates with lambdaGC values greater than the lambdaGC value of the original distribution.

Cis-eQTL analysis for the candidate loci

Expression quantitative trait loci (eQTL) analysis was conducted using gene expression data obtained using adult human prefrontal cortex obtained from the “BrainCloud” study ( [48]. Only individuals of European ancestry (n = 112) were included in the analysis. Individual-level mRNA expression data were downloaded from the Gene Expression Omnibus (accession number GSE30272) and genotype data were obtained from dbGaP (accession number phs000417.v1.p10.) Linear regression models were used to identify SNPs located within 500 kb of the transcript being tested (cis-eQTL) with statistically significant correlations between genotype and mRNA expression levels, with RNA integrity numbers (RIN) and age included as covariates. Genotype imputation for chromosome regions of interest was performed as described elsewhere [49].

Functional enrichment analysis

To identify plausible pathways associated with RRB, we expanded our focus beyond single variants by performing functional enrichment analysis using the web-accessible bioinformatics tool, Database for Annotation, Visualization and Integrated Discovery (DAVID) [50, 51]. Since DAVID can only handle gene lists, SNPs with an association P-value smaller than 0.01 were used to compile a list of genes for further analysis (i.e., a list of all genes that contain associated SNPs or are the nearest genes to associated intergenic SNPs). Analysis was performed using the software’s Functional Annotation Clustering option. The“Functional Annotation Clustering” tool identifies gene annotation terms that are enriched in the input gene list compared to the gene list from entire human genome and ranks the terms according to their enrichment P-values calculated using Fisher’s exact test. Subsequently, “clusters”of related gene annotation terms that are enriched in the input gene list are assigned an “enrichment score” (ES), defined as the geometric mean of the log10-transformed P-values for all gene annotation terms in the cluster. Enrichment scores > 1.3 are considered to be nominally significant.


Discovery (AGRE) dataset

Factor substructure of RRB

Descriptive statistics of 11 ADI-R items used in these analyses are listed in Table 1. Detailed information regarding each item is given in Additional file 4. The two-factor solution provided a satisfactory fit to 11 ADI-R items in the PCA analysis (Chi Square [df = 34] = 128.61, P-value = 6.38E-13). Using a cutoff of ≥ 0.3 for the inclusion of items in a respective factor, 9 out of 11 items loaded on the two factors (Table 2). Four items loaded on Factor 1 and five items loaded on Factor 2. Loadings on Factor 1 (Repetitive Sensory Motor: RSM subcategory) ranged from 0.46 to 0.75. Loadings on Factor 2 (Insistence on Sameness: IS subcategory) ranged from 0.30 to 0.68.Sum scores for RSM and IS were calculated by summing the scores of items included in each factor. RSM and IS gave similar score distributions that spanned the full range of possible scores (i.e., 0–11 for RSM, 0–15 for IS). For the entire sample, the mean RSM score was 4.096 (SD = 2.82), and the mean IS score was 3.73 (SD = 2.92) (Additional file 5). Together, the two subcategories accounted for 42 % of the variance of RRB. RSM and IS subcategory scores were correlated at r = 0.204 (P-value < 0.001), indicating that they share 4 % of their variance (r2 = 0.04).

Table 1

Descriptive statistics of 11 “current score” items from ADI-R in the AGRE cohort

Table 2

Two-Factor solution for the Restricted and Repetitive Behaviors (RRB) in the AGRE cohort using PCA with varimax rotation

ANOVA was conducted to compare mean scores for each derived subcategory with respect to status category defined by AGRE: Autism, NQA (Not Quite Autism), and Broad Spectrum [52]. Both for RSM and IS, significant differences were detected among the three status categories (F = 101.8, P-value < 0.0001; F = 28.4, P-value < 0.0001). The Tukey HSD post-hoc test indicated that both RSM and IS scores for individuals in the Autism category were significantly higher than for individuals in the NQA or BS categories, who did not differ in their scores (Fig. 1).

Fig. 1

Box plots of RSM and IS scores for ASD probands in the AGRE cohort. The y-axis indicates sum scores for RSM and IS, calculated by summing the scores of items included in each RRB subcategory. Scores for each RRB subcategory were calcluated separately...

Familiality of RRB

To test potential familial relationships between sibling pairs in our data, we calculated interclass correlations (ICCs) between sibling pairs for RSM and IS and the original RRB domain from ADI-R. A significant family genetic effect was shown for IS (ICC = 0.405, P-value = 0.005), while the ICC for RSM was 0.042 (P-value =0.416) and the ICC for RRB was 0.359 (P-value = 0.015).

Genome-wide association analysis

After filtering for minor allele frequency (MAF) < 5 % and info score r2 < 0.3, 6,066,362 genotyped or imputed SNPs were tested for association with standardized and normalized RSM (univariate linear mixed model), IS (univariate linear mixed model), and RSM/IS (multivariate linear mixed model) scores using GEMMA mixed model association analyses. Quantile-Quantile (QQ) plots of P-values for association with IS or RSM in the AGRE cohort GWAS are shown in Fig. 2. QQ plots of P-values for association with RSM/IS and permuted RSM/IS phenotypes are also shown, with the true distribution compared to the permuted distributions (Fig. 2c). The true lambdaGC value was 1.0355, while the permutation P-value for RSM/IS association result was 0.078 based on the distribution of lamdaGC values obtain from all permutations analyzed (Fig. 2d). These results suggest little evidence for inflation of P-values due to stratification or other confounding biases, but provide compelling evidence for RRB association. The Manhattan plots for associations of SNPs with RSM、IS or RSM/IS quantified using GEMMA are shown in Fig. 3.

Fig. 2

Quantile-Quantile plots of P-values for association with IS, RSM or RSM/IS in the AGRE GWAS cohort. Q-Q plots compare log10-transformed observed P-values [−log10(Observed P)] to log10-transformed P-values expected under the null hypothesis of...

Fig. 3

Manhattan plots for RSM (a), IS (b) and RSM/IS (c) for the AGRE cohort. The x-axis indicates the position of each SNP on individual autosomal chromosomes and the y-axis indicates –log10 transformed GEMMA P-values. Red line: genome-wide significance...

We failed to detect genome-wide significant associations (P-value < 5.0E-8) with RSM or IS in the AGRE cohort. However, rs11512467 at 11q23 showed an association signal close to genome-wide significance for RSM (5.67E-08) and ten SNPs at this locus had P-values below 4.27E-07, all located within 20 kb upstream from IL10RA (Fig. 3a), Table 3). The strongest univariate association signals for IS were detected at 8p21.2-8p21.1, including the top 4 significant SNPs/rs62503729 (P-value = 1.39E-07), rs13278976 (P-value = 1.64E-07), rs13270725 (P-value = 1.64E-07) and rs35240189 (P-value = 1.65E-07), located within 10 kb downstream of the STMN4 gene (Fig. 3b), Table 4). Several other regions showing suggestive association signals (P-value < E-06) for RSM or IS were also detected, such as 1q21-1q21.2 (within the 3’-UTR of MCL1 (Myeloid cell leukemia 1) and an intron on ENSA (endosulfine alpha)), 6p24.3 (in an intron of TXNDC5 (thioredoxin domain containing 5)) (Table 3, Table 4). These regions cannot be easily implicated in RRB, but may still provide new insights, if confirmed in follow-up studies with larger samples or sequencing validations.

Table 3

List of the most significant associations with RSM (P-value < E-06) in the AGRE cohort

Table 4

List of the most significant associations with IS (P-value < E-06) in the AGRE cohort

For RSM/IS, the top-ranked SNPs were identified within 8p21.2-8p21.1, completely overlapped with the most highly associated region for IS (Fig. 3c), Table 5). As shown in regional plots of association signals (Fig. 4a b c), association with 8p21.2-8p21.1 region SNPs is greater for RSM/IS than for IS or RSM, suggesting that the multivariate model including RSM and IS might provide greater power for detecting associations. Since no significant family genetic effect was detected for RSM in our analysis, association in the multivariate association analysis was likely driven by the IS category. It should also be noted that this region did not show any significant association with ASD diagnosis after we re-analyzed the association for this region in AGRE cohort following the earlier analysis [9] (Fig. 4d). Based on these observations, we focused subsequence analyses on the 8p21.2-8p21.1 region and hypothesized that common variants in this region may be novel candidate loci for RRB, especially IS.

Table 5

List of the most significant associations with RSM/IS (P-value < 5.0E-07) in the AGRE cohort

Fig. 4

Regional plots showing association mapping results for SNPs located on chromosome 8p21.1-8p21.2 region for RSM/IS (a), IS (b), RSM (c) and ASD (d). *Each filled circle represents the P-value for one SNP, with the top SNP, represented by a purple diamond...

Replication (SSC) dataset

Based on PCA of 11 ADI-R scores in the SSC dataset, we identified the same two RRB subcategories, RSM and IS, observed in the AGRE dataset (Additional file 6). However, no statistically significant signals were detected for association with RSM、IS or RSM/IS in GEMMA univariate or multivariate analysis of 8p21.2-8p21.1 region SNPs. The top-ranked SNP rs17057065 for RSM/IS with P-value 0.041 is located in an intron of PTK2B (Regional plot showing association P-values for 8p21.2-8p21.1 region SNPs is shown in Additional file 7). The lack of association in the SSC dataset may be caused by between-study heterogeneity. SSC probands included in present analysis are from simplex families, in contrast to 90 % of the probands in the AGRE cohort who are from multiplex families. Thus, the relatedness might increase the power for association analysis in AGRE sample. An additional source of heterogeneity comes from the age distributions of the probands in these two dataset, for which the sample means are significantly different (t = 34.3, P-value < 0.001).

Although the associations we reported for 8p21.2-8p21.1 were not observed in SSC dataset, previous observations provide biological plausibility for the contribution of chromosome 8p and 8p21.2-p21.1 to ASD. Chromosome 8p is known to harbor numerous genes implicated in developmental neuropsychiatric disorders, including schizophrenia and ASD [53]. In the largest schizophrenia linkage analysis to date, the 8p21.2-p21.1 region was found to be associated with schizophrenia (P-value = 4.00E-04) [54]. In a subsequent publication, the same group reported significant associations for SNPs in and around DPYSL2 and ADRA1A, 8p21.2-p21.1 region genes previously associated with schizophrenia in family-based and case-control association studies [55, 56]; the strongest associated SNP (rs7817434; P-value = 3.01E-04) is located 377 kb from the top signal in our analysis (rs2322600) [57]. Recent meta-analyses of CNV and GWA studies results suggest that there are both clinical and biological links between autism and schizophrenia [58, 59], so it is highly plausible that common variants in this region contribute to both ASD and schizophrenia. Significantly, a subset of schizophrenia patients display multiple repetitive behaviors [60].

Genomic features of the 8p21.2-8p21.1 region

The SNP showing the most significant association for RSM/IS, rs2322600, has a P-value just shy of genome-wide significance (P-value = 6.48E-08). In addition, twenty-four SNPs in the same region have P-values for association smaller than 5E − 07, including the top associated SNP for IS (rs62503729) and three genotyped SNPs (rs1562331, rs2322606 and rs10097861) that are in linkage disequilibrium with the top three SNPs (Additional file 8). Together, these 25 SNPs span a 117 kb region that contains three genes: STMN4, TRIM35 and PTK2B. STMN4 (stathmin-like 4) interacts directly with microtubules, causing a switch from a straight to a curved conformation that has been proposed to promote rapid microtubule depolymerization. According to the BrainCloud study database (, STMN4 mRNA is highly expressed in both fetal and adult dorsolateral prefrontal cortex (DLPFC) (Additional file 9) [48], consistent with a role in early neuron development [61, 62]. PTK2B (Protein Tyrosine Kinase 2 Beta) encodes a major focal adhesion kinase that plays a key role in neuritogenesis and neurite elongation [63]. Because all the top SNPs are located in non-coding gene regions, we hypothesized that these SNPs are linked to genetic variants that regulate the expression STMN4 or PTK2B.

Expression quantitative trait locus (eQTL) analyses using adult CEU prefrontal cortex

With the aim of exploring the molecular basis of the observed associations with RSM/IS, we investigated whether our top SNPs or their proxies (r2 > 0.7) associate with gene expression in the dorsolateral prefrontal cortex (Brodmann area 46) using expression and genotype data of 112 healthy adult CEU individuals. Top SNPs and multiple proxy SNPs in or near STMN4 showed nominally significant association between genotype and STMN4 mRNA expression (rs2322600, P-value = 0.023; rs35240189, P-value = 0.014; rs62503729, P-value = 0.024), with the strongest association represented by rs10097861 (a proxy of rs2322600, r2 = 0.76) at a P-value of 0.002. None of the SNPs were associated with the expression of PTK2B mRNA. Since expression data in BrainCloud dataset were from adult human brain, it is possible that the genetic variants regulate PTK2B mRNA expression only in the developing brain.

Bioinformatic evaluation

We queried the RegulomeDB database [64] to assess whether any of the 28 SNPs that associated with the RSM/IS sub-phenotypes (P-value < 5E-07, n = 28 SNPs) are located in known or predicted regulatory elements, including regions of DNase I hypersensitivity, binding sites for transcription factors and promoter regions that regulate transcription. Two SNPs, rs2322606 and rs10097861, received RegulomeDB likelihood scores of 1f (i.e., mapping to a predicted TF binding site and/or within a DNase I sensitivity peak and correlating with gene expression). Both of these variants, which are located within an intron of PTK2B, associated with expression of BNIP3L mRNA in a lymphoblastoid cell line and located within DNase I hypersensitivity peaks.

Functional enrichment pathway analysis

In total, 252 genes linked to SNPs with nominal associations (P < 0.01) with RSM/IS in the multivariate analysis of the AGRE cohort were selected for enrichment analysis using DAVID [see Additional file 10]. Using the DAVID Functional Annotation Clustering tool, we identified 7 annotation term clusters with enrichment scores > 1.3 (equivalent to a nominal P-value ≤ 0.05), including two clusters containing pathways crucial for brain development and function [see Additional file 11]. Cluster 1(enrichment score: 1.93) contained several pathways previously implicated in the pathogenesis of ASD, including neuron development (GO:0048666, P-value = 0.002797) [65], neuron projection development (GO:0031175, P-value = 0.003104) [66] and axon guidance (GO:0007411, P-value =0.016). Cluster 3 (enrichment score: 1.75) contained several cell-adhesion pathways, which have also been implicated in the pathogenesis of Autism Spectrum Disorders [9, 67].

As a negative control, we also carried out DAVID-based enrichment analyses using candidate gene lists derived from association analyses of 10 sets of permuted RSM/IS phenotypes (specifically, the phenotype-permuted replicates with the ten highest lambdaGC values). Among 53 annotation term clusters with enrichment scores greater than 1.3 that were obtained for the ten gene sets, only one was related to the brain/neuron development (Additional file 12). These results suggest that the gene list derived from the original, non-permutated association is enriched in brain and plausible ASD-related pathways. Because no enriched annotation term cluster identified in present study survived correction for multiple testing, however, the possibility of false-positive enrichments cannot be excluded.


In this work, analysis of ADR-R RRB item scores in the AGRE and SSC datasets confirmed the existence of two previously identified subcategories in the RRB domain, RSM and IS. GWAS of RSM and IS subcategory scores based on univariate and multivariate mixed models identified common variants within 8p21.2-8p21 as possible susceptibility locus for RRB in the AGRE dataset, but not in the SSC dataset.

Univariate association analysis identified different association patterns for IS and RSM, including signals at 1q21-1q21.2、6p24.3、11q23 and 8p21.2-8p21.1 that have not previously been reported for association with RRB. Several regions in 15q have previously been linked to RSM or IS in linkage analysis, including 15q21.2-q22.2, 15q13.1-q14 [25] and 15q11-q13 [31]. Our analysis failed to detect suggestive associations for RSM or IS at these loci, although a suggestive association with IS was detected for the 15q24.1 region SNP rs138618349 (P = 1.99E-06). Lack of replication may reflect difference in populations and/or methodology, i.e., linkage vs. association analysis.

Common variants within chromosome 8p21.2-8p21.1, a locus previously linked to schizophrenia, approached genome-wide significance for RSM/IS and were also the top signals for IS in univariate association analysis. Since many genetic variants linked to ASD have a high degree of pleiotropy (i.e., where one gene affects more than one phenotype), it is reasonable that some genetic variants contribute to both RSM and IS and were detected with higher association magnitude using multivariate association model [68]. Although the associations we reported for 8p21.2-8p21.1 were not observed in the SSC dataset, we should mention that a partial trisomy of 8p(21-23) has been identified in a 6-year-old female with autism [69]. This region is also included in a large (6.14 Mbp) chromosome duplication identified in a patient with autism and self-mutilation [70]. This patient presented with abnormal behaviors, including ritualistic behaviors, self-injury and temper tantrums, consistent with the hypothesis that this chromosomal region contains a dosage-sensitive gene that contributes to RRB phenotypes.

Our top SNPs and multiple proxy SNPs in or near STMN4 were identified as eQTLs for STMN4 in human prefrontal cortex, and it is plausible that STMN4 influences RRB domain phenotypes by modulating neuron development and dendritic microtubule dynamics [71].

Furthermore, based on mRNA expression data from two public databases [64, 72], our top SNP, rs2322600, and several proxy SNPs correlate with expression of BNIPL3, a gene located almost 890 kilo base pairs upstream (rs2322600: P-value = 9.17E-06). Since long-range regulation of mRNA expression by genetic elements located as far away as 1 Mbp, has been previously described [73, 74], in principle, this gene may also be considered a candidate for RRB. Although BNIPL3 has not previously been reported to be associated with ASD, it encodes a mitochondrial outer membrane protein that is required for mitochondrial clearance and has been proposed to play a role in hypoxia-induced autophagy [75]. Recent research [67] has shown that children and adolescents with autism have high dendritic spine density in the brain and this excess is due to a defect in dendritic spine “pruning,” a process essential for normal brain development [76]. The same study also showed that the abnormal spine pruning is caused by a defect in autophagy in neurons [67]. Mitochondria localize in both pre- and postsynaptic department (axon terminals and dendritic spines), and mitophagy is crucial for brain development and dendritic spine pruning [77].

PTK2B has been widely studied since it was identified as a novel Alzheimer’s disease (AD) candidate gene in a large meta-analysis of AD GWAS [78]. PTK2B kinase, a major focal adhesion kinase, regulates the integrity of focal adhesions, which are major compartments for integrating signals for cell growth, apoptosis, and neuron migration, cellular functions essential for normal brain development [79]. Since several neuronal cell-adhesion genes have been identified in rare ASD cases [80, 81] and a GWA study has shown that neuronal cell-adhesion molecules may be collectively associated with ASDs [9], it is plausible that PTK2B contributes to ASD through its roles in regulating integrity of focal adhesions.

Because the terms in each enriched cluster identified by the pathway analysis did not survive multiple testing correction (P-value > 0.05 after controlling for the false discovery rate (FDR) using the Benjamini-Hochberg method), we could not identify specific biological pathways that contribute to the development of RRB in ASD. However, DAVID analysis included PTK2B among the top three enriched terms in Cluster1(Additional file 11), based on its function in focal adhesion formation and regulation of adherens junction dynamics by phosphorylation switches [82], providing evidence that PTK2B is a plausible candidate gene for RRB.


In this study, univariate and multivariate genome-wide association studies of RRB subcategories using data from an AGRE ASD cohort enabled the detection of associated SNPs at 8p21.2-8p21.1. This region contained 25 genotyped or imputed SNPs with P-values for association with RSM/IS < 5E-07, with the top SNP (P-value = 6.48E-08) just missing genome-wide significance. Notably, 8p21.2-8p21 has previously been linked to schizophrenia and our top SNPs are located in close proximity and/or correlate with the expression of several genes with plausible connections to ASD and RRB. Because association signals in this chromosome region were not detected in the SSC ASD dataset, however, more work will be required to validate the possible contributions of common variants in 8p21.2-8p21.1 to RRB or ASD.


We are grateful to the participating families at the Autism Genetic Resource Exchange database for their help and willingness to participate in the database. We appreciate our access to the phenotypic and genotype data on AGRE database. Approved researchers can obtain the AGRE population dataset by applying at

We are grateful to the families at the participating SFARI Simplex Collection (SSC) sites and to the principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, E. Wijsman). We appreciate obtaining access to phenotypic data on SFARI Base. Approved researchers can obtain the SSC population dataset described in this study ([sfaricollectionv91]/ui:view()) by applying at

The authors gratefully acknowledge the support of the Intramural Research Program, National Institute of Mental Health, National Institutes of Health (IRP-NIMH-NIH, grant number MH002930-05) and the National Basic Research 973 Program of China (2010CB529601).


ADI-RAutism Diagnostic Interview-Revised
ADOSAutism Diagnostic Observation Schedule
AGREAutism Genetic Resource Exchange
ASDAutism Spectrum Disorders
DAVIDDatabase for Annotation, Visualization and Integrated Discovery
ESenrichment score
eQTLexpression quantitative trait loci
GEMMAGenome-wide Efficient Mixed-Model Association
ISinsistence on sameness
ICCsintraclass correlations
lambdaGCLambda genomic conttol
MDSmultidimensional scaling
PCAPrincipal Components Analysis
RSMRepetitive Sensory Motor
RRBrestricted and repetitive behaviors
SSCSimons Simplex Collection


Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

Yu Tao downloaded and prepared the AGRE and SSC data, and conducted statistical analyses, and was the primary author of the manuscript; Benjamin Ackerman and Wei Guo contributed to the statistical analysis; Hui Gao carried out the functional enrichment analysis; David Saffen provided statistical support and advice concerning the interpretation of the findings, and edited the final version of the manuscript; Yin Yao Shugart contributed to study design, coordinated statistical analyses and data interpretation, and was the primary investigator of the study. All authors read and approved the final manuscript.

Contributor Information

Yu Tao, Email: nc.ude.naduf@23000701121.

Hui Gao, Email: nc.ude.naduf@32000701221.

Benjamin Ackerman, Email: ude.uhj@01rekcab.

Wei Guo, Email: vog.hin@3oug.iew.

David Saffen, Email: nc.ude.naduf@neffas.

Yin Yao Shugart, Email: vog.hin.liam@oay1yak.


1. Abrahams BS, Geschwind DH. Advances in autism genetics: on the threshold of a new neurobiology. Nat Rev Genet. 2008;9(5):341–355. doi: 10.1038/nrg2346.[PMC free article][PubMed][Cross Ref]

2. Fombonne E. Epidemiology of pervasive developmental disorders. Pediatr Res. 2009;65(6):591–598. doi: 10.1203/PDR.0b013e31819e7203.[PubMed][Cross Ref]

3. Gillberg C. Infantile autism and other childhood psychoses in a Swedish urban region. Epidemiological aspects. J Child Psychol Psychiatry. 1984;25(1):35–43. doi: 10.1111/j.1469-7610.1984.tb01717.x.[PubMed][Cross Ref]

4. Kim YS, Leventhal BL, Koh YJ, Fombonne E, Laska E, Lim EC, et al. Prevalence of autism spectrum disorders in a total population sample. Am J Psychiatry. 2011;168(9):904–912. doi: 10.1176/appi.ajp.2011.10101532.[PubMed][Cross Ref]

5. Elsabbagh M, Divan G, Koh YJ, Kim YS, Kauchali S, Marcin C, et al. Global prevalence of autism and other pervasive developmental disorders. Autism research : official journal of the International Society for Autism Research. 2012;5(3):160–179. doi: 10.1002/aur.239.[PMC free article][PubMed][Cross Ref]

6. Geschwind DH. Advances in Autism. Annu Rev Med. 2009;60:367–380. doi: 10.1146/[PMC free article][PubMed][Cross Ref]

7. Berg JM, Geschwind DH. Autism genetics: searching for specificity and convergence. Genome Biol. 2012;13(7):247. doi: 10.1186/gb-2012-13-7-247.[PMC free article][PubMed][Cross Ref]

8. Geschwind DH, State MW. Gene hunting in autism spectrum disorder: on the path to precision medicine. The Lancet Neurology 2015;14(11):1109-20. [PMC free article][PubMed]

9. Wang K, Zhang H, Ma D, Bucan M, Glessner JT, Abrahams BS, et al. Common genetic variants on 5p14.1 associate with autism spectrum disorders. Nature. 2009;459(7246):528–533. doi: 10.1038/nature07999.[PMC free article][PubMed][Cross Ref]

10. Ma D, Salyakina D, Jaworski JM, Konidari I, Whitehead PL, Andersen AN, et al. A genome-wide association study of autism reveals a common novel risk locus at 5p14.1. Ann Hum Genet. 2009;73(Pt 3):263–273. doi: 10.1111/j.1469-1809.2009.00523.x.[PMC free article][PubMed][Cross Ref]

11. Weiss LA, Arking DE, Daly MJ, Chakravarti A. A genome-wide linkage and association scan reveals novel loci for autism. Nature. 2009;461(7265):802–808. doi: 10.1038/nature08490.[PMC free article][PubMed][Cross Ref]

12. Anney R, Klei L, Pinto D, Regan R, Conroy J, Magalhaes TR, et al. A genome-wide scan for common alleles affecting risk for autism. Hum Mol Genet. 2010;19(20):4072–4082. doi: 10.1093/hmg/ddq307.[PMC free article][PubMed][Cross Ref]

13. Anney R, Klei L, Pinto D, Almeida J, Bacchelli E, Baird G, et al. Individual common variants exert weak effects on the risk for autism spectrum disorderspi. Hum Mol Genet. 2012;21(21):4781–4792. doi: 10.1093/hmg/dds301.[PMC free article][PubMed]

Additional files

Additional file 1:(88K, docx)

Two-dimensional Multidimensional Scaling (MDS) plot of the AGRE population. (DOCX 88 kb)

Additional file 2:(43K, xlsx)

Sample information for 1,335 individuals in the AGRE cohort. (XLSX 42 kb)

Additional file 3:(42K, xlsx)

Sample information for 2,588 individuals in SSC. (XLSX 41 kb)

Additional file 4:(68K, xlsx)

Current scores for eleven ADI-R items in AGRE that assess restricted repetitive behaviors. (XLSX 68 kb)

Additional file 5:(60K, docx)

Score distributions of the RSM and IS subcategories in the AGRE cohort. (DOCX 60 kb)

Additional file 6:(15K, docx)

Two-Factor solution for the Restricted and Repetitive behaviors in SSC using PCA with varimax rotation. (DOCX 15 kb)

Additional file 7:(68K, docx)

Regional plot showing association mapping results for association with RSM/IS for SNPs located within chromosome 8p21.1-8p21.2 in the SSC dataset. (DOCX 67 kb)

Additional file 8:(41K, docx)

Haploview linkage disequilibrium (LD) plots for genotyped SNPs and the top three associated SNPs inTable 5of the manuscript. (DOCX 41 kb)

Additional file 9:(27K, docx)

Developmental time course ofSTMN4mRNA expression in adult CEU (Caucasian and European descent) human prefrontal cortex. (DOCX 26 kb)

Additional file 10:(22K, xlsx)

Candidate gene list for Functional Enrichment Analysis. (XLSX 21 kb)

Additional file 11:(27K, docx)

Enrichment of functional annotation terms identified using DAVID software for genes linked to SNPs that nominally associate with RSM/IS. (DOCX 27 kb)

Additional file 12:(54K, xlsx)

Enrichment of functional annotation terms identified using DAVID software for genes linked to SNPs from ten results of phenotype-permuted datasets. (XLSX 53 kb)

MATH2065 - Introduction to Partial Differential Equations

Ease - 8/10
Not particularly difficult by any means, but the assessments, particularly the exam, did throw quite a few curve balls. That ended up affecting my mark a little bit. If you keep on top of the work then you should be fine; the tutorial questions cover a wide range of styles so you should get some good practice.
Lecturer - 7/10 (Peter Kim)
Peter Kim was a decent lecturer; he had very nice written lecture notes and was able to explain concepts quite well. Unfortunately he would have to have been a very exceptional lecturer to keep me motivated to come to 8am lectures everyday, which he wasn't. But he kept his online resources well organised and up to date so I can't complain too much.
Interest - 8/10
Differential equations play a very important part in the study of applied mathematics, and there are many different applications in a wide range of areas from science, engineering and finance. That was enough to keep me interested in the course. You'll review ODE's from first year, and be introduced to transforms, convolution, PDE problems and Fourier series, the latter I particularly liked. After this I'm definitely looking forward to MATH3063 and MATH3078.
Overall - 8/10
Would definitely recommend. The course would benefit from a better timetable and slightly better lecturer, but it's much too crucial a subject in the study of mathematics to ignore.

MATH2070 - Optimisation and Financial Mathematics

Ease - 9/10
This subject was a lot easier than it could have been. Quite a few topics, particularly the Capital Asset Pricing Model, had been left out of the exam. Which isn't too tedious to learn, it's just all of my lectures clashed for this subject so I was slightly behind in teaching a lot of the concepts to myself. In any case, the optimisation sections are just about following certain steps, and the financial maths questions could be in a style that you perhaps may not be used to, but it's not extremely difficult at all. Computer labs weren't difficult, but were a little tedious. You're taught to use MATLAB, but in the end it's easy enough to Google all the relevant commands you need when you get to the final project.
Lecturer - 9/10 (Georg Gottwald)
Unfortunately I didn't get to see much of Georg due to clashes and work commitments. But when I was there, he was extremely helpful in explaining new concepts, and I couldn't really ask for anything more. I agree with Phyre that his lecture notes are a little hard to read, but really, they're more or less copies of what is in the lecture notes, so you can get by with those just fine.
Interest - 8/10
It's always good to learn maths that is directly useful to things. A lot of the optimisation strategies you learn can actually be put to good use in day to day activities if you feel confident enough to formulate your own problems. And the financial mathematics section is quite interesting too, and would be very useful if your career prospects are that way inclined.
Overall - 9/10
Definitely would do again. Probably the easiest 2nd year maths unit you can do, so you could definitely use it for a WAM booster if you are good at maths.

MATH3061 - Geometry and Topology

Ease - 7/10
Being a pure mathematics course, the lectures are quite focused on theory. Even though this is a normal stream course, you do go through quite a lot of proofs in lectures which don't really show up in the assessments too much at all. I would definitely focus on the tutorial questions, and old resources if you have them, in order to practice your understanding of the theory you go through. If you can understand the content then the assessments are really not that bad, topology particularly so.
Lecturers - 5.5/10 (Laurentiu Paunescu 3/10, Andrew Mathas 8/10)
Paunescu took the geometry section, and I didn't like his lectures at all. He didn't really bother to implement any sort of structure within his lectures; he just rambled on from one concept to the next, and his explanations were quite hard to follow. Plus he didn't upload any lecture notes, so you were pretty much forced to go to his lectures. Mathas was a lot better; he took the topology section which was a bit more theoretical, but he gave everything context which made his lectures far more engaging than Paunescu's.
Interest - 7/10
It's interesting thinking about space and objects from mathematical standpoints you haven't really considered before. Geometry, particularly transformations and isometries, and topology introduce ways of seeing such things as movements, change in state or formations from a mathematical perspective, which I enjoyed. The depth you cover the material is a little more shallow than what you would expect in the advanced stream, and it does get a little abstract at times, but it's certainly interesting to learn.
Overall - 7/10
Good course. If Paunescu's not lecturing then definitely consider it. If he is, be wary that you may have to put in a little more effort to understand things.

0 thoughts on “Stat2012 Assignment Sheet”


Leave a Comment

Your email address will not be published. Required fields are marked *