Longit: A Platform For Online Statistical Data Analysis

Longit is a platform for online statistical data analysis. Currently, Longit contains statistical methods for analyzing independent data and correlated data. It has the web and desktop versions for exploring data, running analysis and visualizing graphics. With Internet, users can upload their own datasets from a local machine to the server and then run data analysis. In Longit, there are many tools for exploring data; in particular, dynamic graphical methods are useful in visualizing high dimensional or complicated multilevel correlated data including non-ignorable missing data. When a user leaves Longit, the data, analysis and outputs will be automatically saved in the server. At the next login, the user may continue the previous analysis.

**Packages:**

In data analysis, the relationship between a continuous response and a set of predictors is typically assumed to follow a linear model (lm). When the response is count or binary data, the generalized linear model (glm) is an extension of the linear case. Residuals diagnostics and sensitivity analysis are useful in checking the model fitting. The software Reg provides functions for simulating, exploring and visualizing data, and the lm and glm modules can be used for fitting models.

Correlated data includes data from repeated measures, longitudinal studies, panel surveys, etc. The response could be continuous, binary or count, etc. There are two typical approaches: mixed models and estimating equations methods. The Multicorr software provides advanced methods for high-dimensional hierarchical data. The software modules include linear mixed (LME) models, generalized linear mixed (GLME) models, generalized estimating equations (GEE) models, and modeling methods for multivariate multinomial data (MVM).

In longitudinal studies, subjects may leave the study and cause dropout data. For continuous outcome, the software module do.lme extends the LME models in Multicorr, where the missing data depends on the unobserved outcome. The software module sa.do.lme contains sensitivity and graphical methods for exploring the influential subjects due to non-ignorable missing data assumption.

It is very common in data analysis that the relationship between the response and its predictor is non-linear. To fit such data, a smoothing curve is required. The approach is a generalized additive model (GAM) using spline smoothing. For correlated data, the approaches are generalized additive mixed models (GAMM) and generalized estimating equations methods with kernel smoothing (GEEK).

StatGL: Statistical Graphical Library

This StatGL consists of fundamental statistical graphical methods and interactive graphical tools. Graphical methods such as the histogram plot, density plot and xy-plot are useful in exploratory data analysis. Some of these graphics can be dynamically linked together. Their graphical parameters can be set up in a set of control panels. Using animation parameters, dynamic graphics and animation can be constructed.

**Usages:**

ROC: ROC Curves (roc.mvm)

Receiver Operating Characteristics (ROC) Curves are graphical methods used for evaluating the validity in comparing medical diagnostic tests or predicting methods. The ROC methods can evaluate the differences in comparing or predicting the accuracy of a test versus a gold standard test. When the outcomes are ordinal data, the module roc.mvm extends the MVM modeling in Multicorr to ROC methods.

GeoStat: Spatial Data and Maps (variogram)

Spatial data is a special case of correlated data containing geographical information. Spatial data involved with time-dependent information are called spatial-temporal data. An example of spatial-temporal data is disease mapping where disease incidence or mortality may have a time trend and a regional difference.