by Frank Wellmer
Smurfit Institute of Genetics, Trinity College Dublin, Ireland
Three decades of research on the genetic and molecular control of flowering has led to a staggering amount of data from a multitude of different experimental approaches (summarized for example in Fornara et al., 2010; Prunet and Jack, 2014). This work has been described in several thousand publications, and even noted experts in this area will hardly be in full command of the wealth of available information. What’s more, young researchers entering the field have an almost unscalable mountain of literature to climb when they want to familiarize themselves with what is already known and what still remains to be investigated. A key question is thus how this large amount of information can be unified, simplified, displayed and brought into a format that readily allows it to be used for approaches such as data mining, meta-analysis or even mathematical modeling. Arguably one of the biggest issues here is that the available data are often of markedly different quality: while some observations are being support by evidence from multiple independent experimental approaches generated by different research groups, others are based on the results of a single study (which, in some cases, might have been poorly designed). And there are examples where the available data are, at least in part, contradictory. Also, the advent of genomics approaches, while highly influential and impactful in plant biology, has led to the generation of datasets with inherent errors, but error rates are difficult to estimate and thus are unknown in most cases. The results from these approaches in particular depend on the experimental set-up used. For example, has the function of a given floral regulator been studied through transcriptomics after its constitutive and ectopic over-expression, or have more sophisticated transgenic lines been established for its analysis resulting in conditions that are closer to the situation found in the wild type?
The points outlined above (and probably many others as well) make the indexing, the processing and the representation of the available data a truly daunting challenge. Recently, Bouché and colleagues have taken on this herculean task and set up a free Flowering-Interactive Database, FLOR-ID (Bouché et al., 2016). While primarily containing information for gene networks involved in the control of flowering time, FLOR-ID also represents knowledge on early events during flower development (Fig. 1).
In total, data from almost 1,600 publications have been used to assemble the database, which covers around 300 regulatory genes. Importantly, FLOR-ID is hand-curated, using a simple but effective search strategy (UniProt and PubMed database searches combined with information taken from reviews and primary research papers). The information in FLOR-ID can be accessed either by downloading tabulated data or through a user interface, which provides beautifully designed and interactive schemes for the different processes known to be involved in flowering-time control and early flower development (Fig. 1). Notably, the results from papers using -omics approaches have been largely omitted, especially if they are only predictive and not backed by results from independent experiments. Some may find this approach too exclusive, but the authors had the good sense to design FLOR-ID to allow user input and to suggest modifications, which, importantly, will be curated and only then incorporated into the active database. Thus, FLOR-ID could become a real community effort and will certainly be a major boon in the field of flowering for years to come.
Fornara F, de Montaigu A, Coupland G. 2010. SnapShot: Control of flowering in Arabidopsis. Cell 141(3):550, 550.e1-2.
Prunet N, Jack TP. 2014. Flower development in Arabidopsis: there is more to it than learning your ABCs. Methods in Molecular Biology 1110, 3-33.
Bouché F, Lobet G, Tocquin P, Perilleux C. 2016. FLOR-ID: an interactive database of flowering-time gene networks in Arabidopsis thaliana. Nucleic Acids Research 44, D1167-1171.