Stata 9 user manual




















It will describe how to indicate missing data in your raw data files, as well as how missing data are handled in Stata logical commands and assignment statements. We will illustrate some of the missing data properties in Stata using data from a reaction time study with eight subjects indicated by the variable id , and the subjects reaction times were measured at three time points trial1, trial2 and trial3.

The input data file is shown below. You might notice that some of the reaction times are coded using a single. The person measuring time for that trial did not measure the response time properly; therefore, the data point for the second trial is missing. As a general rule, Stata commands that perform computations of any type handle missing data by omitting the row with the missing values. As you see in the output below, summarize computed means using 4 observations for trial1 and trial2 and 6 observations for trial3.

In short, the summarize command performed the computations on all the available data. A second example shows how the tabulation or tab1 command handles missing data. Like summarize, tab1 uses just available data. Note that the percentages are computed based on the total number of non-missing cases. It is possible that you might want the percentages to be computed out of the total number of observations, and the percentage missing for each variable shown in the table.

This can be achieved by including the missing option which can be shortened to m after the tabulation command. We would expect that it would perform the computations based on the available data and omit the missing values. Here is an example command. The output is show below. Note how the missing values were excluded. Stata will perform listwise deletion and only display correlation for observations that have non-missing values on all variables listed.

Stata also allows for pairwise deletion. Correlations are displayed for the observations that have non-missing values for each pair of variables. This can done using the pwcorr command. We use the obs option to display the number of observation used for each pair. As you can see, they differ depending on the amount of missing.

Otherwise, the memory can be cleared using clear, which also works as an option of use use filename. Data management 5. General command syntax Most of the Stata commands can be abbreviated. For example, instead of typing generate, Stata will also accept gen. The help screen demonstrates for each command how it can be abbreviated, by showing underlined letters in the syntax section of the help.

There are a number of shorthand conventions to reduce the amount of typing. All variables matching the pattern are returned. If more than one variable match, an error message is returned. The - character indicates that all variables in the dataset, starting with the variable to the left of the - and ending with the variable to the right of the - are to be returned. Some commands are using all variables by default if none are specified e.

Using bysort instead of by makes previous sorting redundant. An example would be to summarize happiness scores by gender:. Min Max happiness 5 6. Min Max happiness 5 3. Note also that Stata marks a missing value for numerical variables as. Anything inside the two delimiters is ignored. For an overview of functions that can be used in expressions, type help functions. It works similar to the generate command expecting expressions and allowing for in- and if- qualifiers. The generate option allows the recoded variable to be saved as a new variable.

Thus, no arithmetic operations can be performed with such a variable. It might be necessary to remove or replace non-numeric characters prior to converting the string variable into numerical format e.

Also, if any non- numerical character is found this variable will not be changed. New variables can be generated with generate varlist. The data type has influence on the amount of memory that is needed. For datasets with a huge number of observations, the data or storage type can have significant influence on the performance of Stata.

Usually, you would want to use the data type consuming the least amount of memory, while saving all the information contained in the variable. Strings are stored as str e. The number after the str indicates the maximum length of the string. Numerical variables are stored as byte, int, long, float, or double, with the default being float.

If you are need to store precise results where interpretations are sensitive to a high decimal precision of the number, then double would be the most appropriate data type. Data types of the existing variables can be seen using the describe command. The internal precision of the variables is unaffected. This is done in two steps. Data description describe General information about the dataset can be retrieved with describe.

The command displays the number of observations, number of variables, the size of the dataset, and lists all variables together with basic information such as storage type, etc.

The command offers further interesting features which can be seen with help codebook. The command order, alphabetic puts all variables in alphabetical order. A single variable can be moved to a specified position with e. Sometimes it is useful not to display value labels in the data browser.

This can be done using the nolabel option. Here, the command assert is often useful. It verifies whether a statement is true or false. If the statement is true, assert does not give any output in the results window. On the other hand, if it is false, assert displays an error message together with the number of contradictions. Additionally, summarize varlist, detail shows certain percentiles including median , skewness, and kurtosis.

User specific percentiles can be shown with centile. Tables of summary statistics can be drawn with table. Its purpose is not analytical but it allows to quickly gain familiarity with unknown data. Data merging append A second dataset can be appended to the end of the one currently used by using the append command. The match variable s is are defined in varlist. If keepusing is not specified, all variables are kept.

Further issues 6. Log files Everything that runs through the results window can be recorded with so-called log files.

These log files can then be printed or saved in other file formats so that the analysis can be retraced independently of Stata. The replace option replaces an existing log file. The command be retraced independently of Stata. Eventually, the log file is closed using log close. Graphs One of the advantages of Stata is its vast graphics capabilities. On the other hand, commands for comprehensive graphs can get quite long, and it takes some time to get used to the code structure.

Take a look ». Advanced programming Stata also includes an advanced programming language—Mata. Learn more about Mata » Stata also has PyStata , which provides comprehensive Python integration, allowing you to harness all the power of Python directly from your Stata code and to harness all the power of Stata from your Python code. Learn more ». Community-contributed features Stata is so programmable that developers and users add new features every day to respond to the growing demands of today's researchers.

World-class technical support All registered users of the current release of Stata Stata 17 are eligible for free technical support. View compatible operating systems ». Widely used Used by researchers for more than 30 years, Stata provides everything you need for data science—data manipulation, visualization, statistics, and automated reporting. Select your discipline and see how Stata can work for you. Behavioral sciences. Institutional research. Public health.

Public policy. Data science. Finance, business, and marketing. Political science. See who else is using Stata » Stata is distributed in more than countries. Comprehensive resources Video tutorials Stata's YouTube channel is the perfect resource for new users to Stata, users wanting to learn a new feature in Stata, and professors looking for aids in teaching with Stata. Visit our YouTube channel ». Read our latest blog post ».

Free Stata webinars Stata webinars offer something for everyone. Training A multitude of training options are available to become proficient at Stata quickly. Visit the Stata Bookstore ». Stata News The Stata News is a free publication with columns such as the popular In the Spotlight , where Stata developers give insight into specific Stata features, and the User's corner , where we share unique, helpful, and fun contributions from the user community.

Stata Journal The Stata Journal is a quarterly publication containing articles about statistics, data analysis, teaching methods, and effective use of Stata's language. Vibrant community Stata Conferences Whether you are a beginner or an expert, you will find something just for you at Stata conferences , which are held each year in various locations around the world.

Statalist A great resource for users is Statalist , a forum where more than 40, Stata users exchange roughly 4, postings and responses each month. Join the forum ». User comments Our users love to share how great Stata is, so we'd like to show you! Affordable Stata is not sold in modules, which means you get everything in one package! Return to menu Comparison of features Product Features. Maximum number of variables Info. Maximum number of observations Info.

Maximum number of independent variables Info. Multicore support Time to run logistic regression with 5 million obs and 10 covariates Info. Complete PDF documentation Info. Windows, Mac, or Unix Info. Disk space requirements. New Features in Stata Stata has something for everyone. Bayesian Econometrics. Faster Stata. Interval-censored Cox model. Multivariate meta-analysis. Bayesian VAR models. Bayesian multilevel models: nonlinear, joint, SEM-like, and more.

Treatment-effects lasso estimation. Galbraith plots. Leave-one-out meta-analysis. Multiple-group IRT Models. Panel-data multinomial logit model. Zero-inflated ordered logit model. Nonparametric tests for trend. Bayesian dynamic forecasting. BIC for lasso penalty selection. Lasso for clustered data. Bayesian linear and nonlinear DSGE models. Do-file Editor improvements: Navigation Control, enhanced bookmarks New functions for dates and times.

Stata on Apple Silicon. Java integration. H2O integration. Jupyter Notebook with Stata. Quick Tips. Recently added. Stata basics. Tour of what's new in Stata. Data management. Automated document and report creation. Bayesian analysis. Binary, ordinal, count, and fractional outcomes.

Mixed logit models Poisson with sample selection Zero-inflated ordered probit Logistic regression in Stata, part 1: Binary predictors Logistic regression in Stata, part 2: Continuous predictors Logistic regression in Stata, part 3: Factor variables Regression models for fractional data Probit regression with categorical covariates New Probit regression with continuous covariates New Probit regression with categorical and continuous covariates New.

Case—control studies. Classical hypothesis tests. Descriptive data science, tables, and cross-tabulations. Dynamic stochastic general equilibrium models DSGEs. Linearized DSGEs. Effect sizes. Tour of effect sizes. Extended regression models ERMs. Factor variables. IRT item response theory. Latent class analysis and finite mixture models. Linear models. Marginal means, predictive margins, and contrasts. Introduction to margins in Stata, part 1: Categorical variables Introduction to margins in Stata, part 2: Continuous variables Introduction to margins in Stata, part 3: Interactions Profile plots and interaction plots in Stata, part 1: A single categorical variable Profile plots and interaction plots in Stata, part 2: A single continuous variable Profile plots and interaction plots in Stata, part 3: Interactions of categorical variables Profile plots and interaction plots in Stata, part 4: Interactions of continuous and categorical variables Profile plots and interaction plots in Stata, part 5: Interactions of two continuous variables Introduction to contrasts in Stata: One-way ANOVA.

Multilevel mixed-effects models. Multilevel tobit and interval regression Nonlinear mixed-effects models Introduction to multilevel linear models, part 1 Introduction to multilevel linear models, part 2 Tour of multilevel GLMs Multilevel models for survey data Multilevel survival analysis Small-sample inference for mixed-effects models. Multiple imputation. Nonparametric data science. Panel data. Panel-data cointegration tests Ordered logistic and probit for panel data Panel-data survival models.

Power and sample size. SEM structural equation modeling. Statistical calculators. Survey data analysis. Survival analysis. Interval-censored survival models Learn how to set up your data for survival analysis How to describe and summarize survival data How to construct life tables How to calculate incidence rates and incidence-rate ratios How to calculate the Kaplan-Meier survivor and Nelson-Aalen cumulative hazard functions How to graph survival curves How to test the equality of survivor functions using nonparametric tests How to fit a Cox proportional hazards model and check proportional-hazards assumption Multilevel survival analysis Panel-data survival models Survival models for SEM Treatment effects for survival models.

Time series.



0コメント

  • 1000 / 1000