<?xml version="1.0" encoding="ISO-8859-1"?>
<GmsArticle>
  <MetaData>
    <Identifier>mibe000021</Identifier>
    <ArticleType>Original Article</ArticleType>
    <TitleGroup>
      <Title language="en">SIBSIM - quantitative phenotype simulation in extended pedigrees</Title>
      <TitleTranslated language="de">SIBSIM - Simulation quantitativer Phänotypen in erweiterten Stammbäumen</TitleTranslated>
    </TitleGroup>
    <CreatorList>
      <Creator>
        <PersonNames>
          <Lastname>Franke</Lastname>
          <LastnameHeading>Franke</LastnameHeading>
          <Firstname>Daniel</Firstname>
          <Initials>D</Initials>
        </PersonNames>
        <Address>Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Institut für Medizinische Biometrie und Statistik, Ratzeburger Allee 160, Haus 4, 23538 Lübeck<Affiliation>University Hospital Schleswig-Holstein, Campus Lübeck, Institute of Medical Biometry and Statistics, Lübeck, Germany</Affiliation>
</Address>
        <Email>daniel.franke@imbs.uni-luebeck.de</Email>
        <Creatorrole corresponding="yes" presenting="no">author</Creatorrole>
      </Creator>
      <Creator>
        <PersonNames>
          <Lastname>Kleensang</Lastname>
          <LastnameHeading>Kleensang</LastnameHeading>
          <Firstname>André</Firstname>
          <Initials>A</Initials>
        </PersonNames>
        <Address>
          <Affiliation>University Hospital Schleswig-Holstein, Campus Lübeck, Institute of Medical Biometry and Statistics, Lübeck, Germany</Affiliation>
        </Address>
        <Email>kleensang@imbs.uni-luebeck.de</Email>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
      <Creator>
        <PersonNames>
          <Lastname>Ziegler</Lastname>
          <LastnameHeading>Ziegler</LastnameHeading>
          <Firstname>Andreas</Firstname>
          <Initials>A</Initials>
        </PersonNames>
        <Address>
          <Affiliation>University Hospital Schleswig-Holstein, Campus Lübeck, Institute of Medical Biometry and Statistics, Lübeck, Germany</Affiliation>
        </Address>
        <Email>ziegler@imbs.uni-luebeck.de</Email>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
    </CreatorList>
    <PublisherList>
      <Publisher>
        <Corporation>
          <Corporatename>German Medical Science</Corporatename>
        </Corporation>
        <Address>Düsseldorf, Köln</Address>
      </Publisher>
    </PublisherList>
    <SubjectGroup>
      <SubjectheadingDDB>610</SubjectheadingDDB>
      <Keyword language="en">computer simulation</Keyword>
      <Keyword language="en">QTL</Keyword>
      <Keyword language="en">phenotype</Keyword>
    </SubjectGroup>
    <DatePublishedList>
<DatePublished>20060221</DatePublished>
</DatePublishedList>
    <Language>engl</Language>
    <SourceGroup>
      <Journal>
        <ISSN>1860-9171</ISSN>
        <Volume>2</Volume>
        <Issue>1</Issue>
        <JournalTitle>GMS Medizinische Informatik, Biometrie und Epidemiologie</JournalTitle>
        <JournalTitleAbbr>GMS Med Inform Biom Epidemiol</JournalTitleAbbr>
      </Journal>
    </SourceGroup>
    <ArticleNo>02</ArticleNo>
  </MetaData>
  <OrigData>
    <Abstract language="de" linked="yes">
<Pgraph>Ein Programm (SIBSIM) zur Simulation quantitativer Phänotypen in erweiterten Familien wird vorgestellt. Es werden sowohl Informationen zum Download als auch zur Installation gegeben, Vorteile und Limitierungen der Implementierung werden beschrieben. Das Eingabeformat ist XML-basiert; die einzelnen Abschnitte werden im Text erklärt. Der Simulationsalgorithmus selbst wird skizziert. Referenzen auf das Benutzerhandbuch und weiterführende Literatur sowie ein detailliertes Beispiel werden angegeben.</Pgraph>
<Pgraph>Verfügbarkeit: Die Software ist erhältlich unter: <Hyperlink href="http://www.imbs.uni-luebeck.de/pub/sibsim">http://www.imbs.uni-luebeck.de/pub/sibsim</Hyperlink>.</Pgraph>
</Abstract>
    <Abstract language="en" linked="yes">
<Pgraph>A tool (SIBSIM) is described for quantitative phenotype simulation in extended pedigrees. Download and installation information are given and the advantages and limitations of the tool are described. The input format is based on XML and the different sections of an input file are explained. A short explanation of the algorithm is given. Links to the download site, the user manual, and related literature as well as a detailed example are included.</Pgraph>
<Pgraph>Availability: The software is available at: <Hyperlink href="http://www.imbs.uni-luebeck.de/pub/sibsim">http://www.imbs.uni-luebeck.de/pub/sibsim</Hyperlink>.</Pgraph>
</Abstract>
    <TextBlock name="Aim" linked="yes">
      <MainHeadline>Aim</MainHeadline>
<Pgraph>The aim of this work is an introduction to SIBSIM, a modern and powerful computer program to simulate genotype and quantitative trait data in extended pedigrees. In the current release (2.1.2), we put emphasis on the simulation of a quantitative trait in pedigrees of arbitrary size without monozygotic twins. Well known software as, e.g., the SIMULATE package <TextLink reference="1"/> are not as scalable as SIBSIM. As an advantage over both G.A.S.P. <TextLink reference="2"/> and SIMLA <TextLink reference="3"/> no predefined boundaries restrict SIBSIM in its potential, neither in genome nor in family size.</Pgraph>
<Pgraph>Instead, SIBSIM is as highly scalable as possible to meet any needs. SIBSIM may not only be used in simulation studies, but also in the validation, verification and testing process of other applications which deal with the implementation of statistical analysis of genomic data. We successfully used SIBSIM in the latter respect and detected a bug in a widely used genetic epidemiological software package.</Pgraph>
<Pgraph>The following paragraphs describe compile- and runtime requirements and recommencements of SIBSIM, the XML configuration file format as well as a short description of the phenotype simulation model.</Pgraph>
</TextBlock>
    <TextBlock name="Implementation" linked="yes">
      <MainHeadline>Implementation</MainHeadline>
<Pgraph>SIBSIM is completely written in C++ and available under the GNU General Public License (GPL) as source code distribution. It is designed as GNU autoconf/automake project and therefore literally portable to any Linux or Unix platform. SIBSIM requires the XML parsing library libxml2 installed which is freely available at <Hyperlink href="http://www.xmlsoft.org/">http://www.xmlsoft.org/</Hyperlink>. The SIBSIM package may be compiled using any compiler, but GNU gcc in version 3.2 or later is recommended. Please refer to the online manual for further information regarding requirements and any other topic related to the building and installation process of SIBSIM.</Pgraph>
</TextBlock>
    <TextBlock name="User interface" linked="yes">
      <MainHeadline>User interface</MainHeadline>
<Pgraph>For flexibility in further development, we decided to use an XML input file format for SIBSIM. A set of tags was defined to specify the various facets of a simulation. The document is divided into multiple sections: one for genotype description, an optional one for the trait and one or more for family structures.</Pgraph>
<Pgraph>The genotype section describes marker and quantitative trait loci by name, position in centiMorgan, alleles and their corresponding frequencies. The user may specify a function name to map from distance to the recombination value <Mark2>&#952;</Mark2>; currently <Mark2>haldane</Mark2> and <Mark2>kosambi</Mark2> are implemented. Optionally, the value for missing genotypes as well as a fraction of missing genotypes that are missing completely at random may be given. In the current release, the simulation of only one diallelic quantitative trait locus is supported.</Pgraph>
<Pgraph>The phenotype section is optional. If undefined, only genotypes are simulated. Phenotype simulation is based on the general variance analytic model, see, e.g. Falconer and Mackay <TextLink reference="4"/> or Ziegler and König <TextLink reference="5"/>. The phenotypic value <Mark2>x</Mark2>
<Mark2>
<Subscript>ik</Subscript>
</Mark2> of an individual <Mark2>k</Mark2> within family <Mark2>i</Mark2> is additively decomposed into an overall mean <Mark2>µ</Mark2>, a major gene effect <Mark2>g</Mark2>
<Mark2>
<Subscript>ik</Subscript>
</Mark2>, being determined by the genotype of the quantitative trait locus together with its specified inheritance model, a polygenic effect <Mark2>G</Mark2>
<Mark2>
<Subscript>ik</Subscript>
</Mark2> which summarizes the effect of multiple genes to the phenotype in question, an environmental effect <Mark2>E</Mark2>
<Mark2>
<Subscript>ik</Subscript>
</Mark2>, and, finally, an error term <Mark2>&#949;</Mark2>
<Mark2>
<Subscript>ik</Subscript>
</Mark2>:</Pgraph>
<Pgraph>
<Mark2>x</Mark2>
<Mark2>
<Subscript>ik</Subscript>
</Mark2> = <Mark2>µ</Mark2> + <Mark2>g</Mark2>
<Mark2>
<Subscript>ik</Subscript>
</Mark2> + <Mark2>G</Mark2>
<Mark2>
<Subscript>ik</Subscript>
</Mark2> + <Mark2>E</Mark2>
<Mark2>
<Subscript>ik</Subscript>
</Mark2>  + <Mark2>&#949;</Mark2>
<Mark2>
<Subscript>ik</Subscript>
</Mark2>
</Pgraph>
<Pgraph>The environmental effect is either simulated as family effect <Mark2>E</Mark2>
<Mark2>
<Subscript>i</Subscript>
</Mark2> which assigns each member of the pedigree the same random value, or as true environmental effect <Mark2>E</Mark2>
<Mark2>
<Subscript>ik</Subscript>
</Mark2> which assigns the same random value to each sibling of a sibship, but different values between distinct sibships. The polygenic component is individually determined using average breeding values analogously to G.A.S.P. <TextLink reference="2"/>, i.e., polygenic effects of founders are drawn from a given distribution with given mean and variance. Effects of non-founders are drawn from the same distribution, but with a mean averaged from the polygenic effects of the respective parents.</Pgraph>
<Pgraph>The pedigree section defines any relations between family members, either founders or non-founders. There are no limitations in respect to the number of individuals per family but one in respect to the family structure: families may not have monozygotic twins. However, families with consanguinity and/or marriage loops are supported. The pedigree section is the only one that may occur multiple times. Each specified pedigree will appear as often as specified by its <Mark2>replicates</Mark2> attribute in the output file(s). For example, let <Mark2>ped</Mark2>1 be a nuclear family of two offspring and their parents. Let <Mark2>ped</Mark2>2 be an extended family of six individuals. The <Mark2>replicate</Mark2> attributes are set to 200 and 100, respectively. Therefore, 1400 individuals in 300 families will be simulated.</Pgraph>
<Pgraph>Finally, global attributes in the XML document define the number of simulated files, the location of file storing and the output format. Currently, only the linkage format as described by <TextLink reference="6"/> is available. Data description files in linkage DAT format, e.g., for use with mapping software may be extracted from XML. An interface to S.A.G.E. <TextLink reference="7"/> was also implemented. Please, see the section SIBSIM <Mark2>Usage</Mark2> of the online manual for further options.</Pgraph>
<Pgraph>Internal as well as external general entities are supported by SIBSIM. Please refer to the online documentation for further information about entities and examples of usage - a more complete introduction to entities is e.g. given in <TextLink reference="8"/>.</Pgraph>
<Pgraph>Monte-Carlo simulations heavily rely on pseudo random number generation. We therefore follow the recommendations of <TextLink reference="9"/> and employ the "long period (&gt;2×10<Superscript>18</Superscript>) random number generator of L'Ecuyer with Bays-Durham shuffle".</Pgraph>
</TextBlock>
    <TextBlock name="Acknowledgements" linked="yes">
      <MainHeadline>Acknowledgements</MainHeadline>
<Pgraph>This work was supported by the Deutsche Forschungsgemeinschaft (ZI 591/12-1).</Pgraph>
</TextBlock>
    <TextBlock name="Appendix - example of simulation setup" linked="yes">
      <MainHeadline>Appendix - example of simulation setup</MainHeadline>
<Pgraph>We attempt to simulate a quantitative trait: the phenotype shall have three components: a genetic effect <Mark2>g</Mark2>
<Mark2>
<Subscript>ik</Subscript>
</Mark2> accounting for 20% of the phenotypic variance. An overall shared environmental effect (<Mark2>E</Mark2>
<Mark2>
<Subscript>i</Subscript>
</Mark2>) contributes another 30% of variance within the phenotype. The remaining variance may be summarized as white noise, a normal distribution with mean zero and variance 0.5 (<Mark2>&#949;</Mark2>
<Mark2>
<Subscript>ik</Subscript>
</Mark2>). The trait emerges in families of three generations (Figure 1 <ImgLink imgNo="1" imgType="figure"/>), 15 individuals in total. The relationships between these individuals were arbitrarily chosen. We selected nine markers from chromosome 22 (Figure 2 <ImgLink imgNo="2" imgType="figure"/>) whose description as well as alleles and the corresponding allele frequencies were freely available from online databases <TextLink reference="10"/>, <TextLink reference="11"/>. Given this setup, we prepared an input file for SIBSIM. Some portions of this file are displayed in Figure 3 <ImgLink imgNo="3" imgType="figure"/>.</Pgraph>
<Pgraph>After processing and validating the input file, SIBSIM either reports an error message or silently simulates genotype and phenotype data. Eventually, the data is written in linkage format in sequentially numbered files (sample output is shown in Figures 4 <ImgLink imgNo="4" imgType="figure"/> and 5 <ImgLink imgNo="5" imgType="figure"/>).</Pgraph>
</TextBlock>
    <References linked="yes">
      <Reference refNo="1">
        <RefAuthor>Terwilliger JD</RefAuthor>
        <RefAuthor>Speer M</RefAuthor>
        <RefAuthor>Ott J</RefAuthor>
        <RefTitle>Chromosome-based method for rapid computer simulation in human genetic linkage analysis</RefTitle>
        <RefYear>1993</RefYear>
        <RefJournal>Genet Epidemiol</RefJournal>
        <RefPage>217-24</RefPage>
        <RefTotal>Terwilliger JD, Speer M, Ott J. Chromosome-based method for rapid computer simulation in human genetic linkage analysis. Genet Epidemiol. 1993;10(4):217-24.</RefTotal>
      </Reference>
      <Reference refNo="2">
        <RefAuthor>Wilson AF</RefAuthor>
        <RefAuthor>Bailey-Wilson JE</RefAuthor>
        <RefAuthor>Pugh EW Sorant AJM</RefAuthor>
        <RefTitle>The Genometric Analysis Simulation Program (G.A.S.P.): a software tool for testing and investigating methods in statistical genetics</RefTitle>
        <RefYear>1996</RefYear>
        <RefJournal>Am J Hum Genet</RefJournal>
        <RefPage>A193</RefPage>
        <RefTotal>Wilson AF, Bailey-Wilson JE, Pugh EW Sorant AJM. The Genometric Analysis Simulation Program (G.A.S.P.): a software tool for testing and investigating methods in statistical genetics. Am J Hum Genet. 1996; 59:A193.</RefTotal>
      </Reference>
      <Reference refNo="3">
        <RefAuthor>Bass MP</RefAuthor>
        <RefAuthor>Martin ER</RefAuthor>
        <RefAuthor>Hauser ER</RefAuthor>
        <RefTitle>Pedigree Generation for Analysis of Genetic Linkage and Association</RefTitle>
        <RefYear>2004</RefYear>
        <RefJournal>Pac Symp Biocomput</RefJournal>
        <RefPage>93-103</RefPage>
        <RefTotal>Bass MP, Martin ER, Hauser ER. Pedigree Generation for Analysis of Genetic Linkage and Association. Pac Symp Biocomput. 2004:93-103. Available from: http://helix-web.stanford.edu/psb04/bass.pdf.</RefTotal>
      </Reference>
      <Reference refNo="4">
        <RefAuthor>Falconer DS</RefAuthor>
        <RefAuthor>Mackay TFC</RefAuthor>
        <RefTitle/>
        <RefYear>1996</RefYear>
        <RefBookTitle>Introduction to Quantitative Genetics</RefBookTitle>
        <RefPage/>
        <RefTotal>Falconer DS, Mackay TFC. Introduction to Quantitative Genetics., 4th ed. Prentice Hall; 1996.</RefTotal>
      </Reference>
      <Reference refNo="5">
        <RefAuthor>Ziegler A</RefAuthor>
        <RefAuthor>König IR</RefAuthor>
        <RefTitle/>
        <RefYear>2006</RefYear>
        <RefBookTitle>A Statistical Approach to Genetic Epidemiology: Concepts and Applications</RefBookTitle>
        <RefPage/>
        <RefTotal>Ziegler A, König IR. A Statistical Approach to Genetic Epidemiology: Concepts and Applications. Weinheim: Wiley-VCH; 2006.</RefTotal>
      </Reference>
      <Reference refNo="6">
        <RefAuthor>Terwilliger JD</RefAuthor>
        <RefAuthor>Ott J</RefAuthor>
        <RefTitle/>
        <RefYear>1994</RefYear>
        <RefBookTitle>Handbook of Human Genetic Linkage</RefBookTitle>
        <RefPage/>
        <RefTotal>Terwilliger JD, Ott J. Handbook of Human Genetic Linkage. Baltimore: The Johns Hopkins University Press; 1994.</RefTotal>
      </Reference>
      <Reference refNo="7">
        <RefAuthor>S.A.G.E.</RefAuthor>
        <RefTitle/>
        <RefYear>2004</RefYear>
        <RefBookTitle>Statistical Analysis for Genetic Epidemiology v4.6</RefBookTitle>
        <RefPage/>
        <RefTotal>S.A.G.E. Statistical Analysis for Genetic Epidemiology v4.6. 2004. Available from: http://darwin.cwru.edu/sage/.</RefTotal>
      </Reference>
      <Reference refNo="8">
        <RefAuthor>Harold ER</RefAuthor>
        <RefAuthor>Means</RefAuthor>
        <RefAuthor>WS</RefAuthor>
        <RefTitle/>
        <RefYear>2002</RefYear>
        <RefBookTitle>XML in a Nutshell</RefBookTitle>
        <RefPage/>
        <RefTotal>Harold ER, Means, WS. XML in a Nutshell. 2nd ed. O'Reilly; 2002.</RefTotal>
      </Reference>
      <Reference refNo="9">
        <RefAuthor>Press HW</RefAuthor>
        <RefAuthor>Teukolsky SA</RefAuthor>
        <RefAuthor>Vetterling WT</RefAuthor>
        <RefAuthor>Flannery BP</RefAuthor>
        <RefTitle/>
        <RefYear>1999</RefYear>
        <RefBookTitle>Numerical Recipes In C</RefBookTitle>
        <RefPage>282</RefPage>
        <RefTotal>Press HW, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes In C. 2nd ed. Cambridge University Press; 1999. p. 282.</RefTotal>
      </Reference>
      <Reference refNo="10">
        <RefAuthor>National Cancer Institute</RefAuthor>
        <RefTitle/>
        <RefYear/>
        <RefBookTitle>MarkerSearch</RefBookTitle>
        <RefPage/>
        <RefTotal>MarkerSearch [database on the internet]. Bethesda (MD): National Cancer Institute (US). Available from: http://lpgws.nci.nih.gov/cgi-bin/MarkerSearch.</RefTotal>
      </Reference>
      <Reference refNo="11">
        <RefAuthor>Marshfield Clinic</RefAuthor>
        <RefTitle/>
        <RefYear/>
        <RefBookTitle>Search for Markers</RefBookTitle>
        <RefPage/>
        <RefTotal>Search for Markers [database on the internet]. Marshfield (WI): Marshfield Clinic (US). Available from: http://www2.marshfieldclinic.org/RESEARCH/GENETICS/Map_Markers/mapmaker/SearchFormFrames.html.</RefTotal>
      </Reference>
      <Reference refNo="12">
        <RefAuthor>RTI International</RefAuthor>
        <RefTitle/>
        <RefYear/>
        <RefBookTitle>Search for Maps</RefBookTitle>
        <RefPage/>
        <RefTotal>Search for Maps [database on the internet]. Research Triangle Park (NC): RTI International (US). Available from: www.gdb.org/jmqp/queryByPos.html.</RefTotal>
      </Reference>
    </References>
    <Media>
      <Tables>
        <NoOfTables>0</NoOfTables>
      </Tables>
      <Figures>
        <Figure width="491" height="346" format="png">
          <MediaNo>1</MediaNo>
          <MediaID>1</MediaID>
          <Caption>
<Pgraph>
<Mark1>Figure 1: Example pedigree: the layout of the pedigree was arbitrarily chosen.</Mark1>
</Pgraph>
</Caption>
        </Figure>
        <Figure width="187" height="597" format="png">
          <MediaNo>2</MediaNo>
          <MediaID>2</MediaID>
          <Caption>
<Pgraph>
<Mark1>Figure 2: Example chromosome: simulation of some markers (boxed) on chromosome 22 in humans. Marker names and their corresponding allele frequencies were taken from online databases [10, 11], visualization was done by use of [12].</Mark1>
</Pgraph>
</Caption>
        </Figure>
        <Figure width="441" height="575" format="png">
          <MediaNo>3</MediaNo>
          <MediaID>3</MediaID>
          <Caption>
<Pgraph>
<Mark1>Figure 3: The simulation as described by Figure 1 and 2 in XML-Syntax, ready to use with SIBSIM</Mark1>
</Pgraph>
</Caption>
        </Figure>
        <Figure width="462" height="376" format="png">
          <MediaNo>4</MediaNo>
          <MediaID>4</MediaID>
          <Caption>
<Pgraph>
<Mark1>Figure 4: Excerpt from a simulated file in linkage PRE format [6]</Mark1>
</Pgraph>
</Caption>
        </Figure>
        <Figure width="609" height="330" format="png">
          <MediaNo>5</MediaNo>
          <MediaID>5</MediaID>
          <Caption>
<Pgraph>
<Mark1>Figure 5: SIBSIM also creates auxiliary files, e.g. linkage files in DAT format [6].</Mark1>
</Pgraph>
</Caption>
        </Figure>
        <NoOfPictures>5</NoOfPictures>
      </Figures>
      <InlineFigures>
        <NoOfPictures>0</NoOfPictures>
      </InlineFigures>
      <Attachments>
        <NoOfAttachments>0</NoOfAttachments>
      </Attachments>
    </Media>
  </OrigData>
</GmsArticle>
