Class Recap 11/11/18
This week we went into depth about finishing up regression and then diving head first into survival analysis. All of this encompasses dependence statistical analysis. We have been looking at OLS and logistic regression and how each of them can effect marketing as well as what information it can provide. OLS or ordinary lease squares regression is meant to explain relationships between variables as well as predicts what might happen in the future. Logistic regressions are the same but their dependent variables are binary which means they can only be one of two options. Within logistic regression they're are many different summary stats that you can look at that will show whether the data provided is able to be used in order to lead to an effective response. First when looking at the model you must look at the variables that are in the equation as well as variable not in the equation and how they effect the data as a whole. When looking at this stat you are given the odds ratio. This is how likely it is that an event will occur and is shown by dividing the probability of an event by one minus the event. When the odds ratio is less then 1 then it is less likely to occur because 1 is the 50/50 mark. In the model summary, there are two types of R squares; the Cox & Snell and the Nagelkerke. Both of these are treated the same as regular R squares in the sense that if they are closer to one then they are significant and proves the data to be trustworthy. If the percentages are high in the classification table, then the model is good to use. In the descriptive statistics section, the mean must be either 1 or 0 because it is a binary variable other wise it would be undefined. The correlations section shows how two specific variables react within one another. If the correlation is above 0.5 it is leaving more towards a strong positive correlation, if less then 0.5 then it is a weak positive, if less then -0.5 it is a weak negative, and if it is greater than -0.5 then is is a strong negative correlation. Negative correlations mean that they both move in the opposite direction while a positive correlation means that the variable moves in the same direction. All of these readings of the data prove whether or not the data is consistent and reliable to use in order to solve the research problem. In marketing the logistic regression would be used when you are looking to see whether a customer will purchase or not purchase. This is an example of a binary variable because the answer can only be one or the other. The experiment could further look at what variable affect the purchase or no purchase such as marketing strategies and region.
The second aspect we focused on this week was survival analysis. This analysis is looking at when an event is most likely to occur and is meant to study until event problems. It may seem similar to logistic regression but it is actually much more complex. It was originally used in drug experiments actually studying the death of subjects which is ironic to the name. The dependent variable in the model is the time until an event which in marketing is usually the purchase. The difference between logistic regression and survival analysis are that logistic regression is used when data is periodic so hen an event that can only occur at regular and specific intervals. Within survival analysis there are censored observations which are ones where we don't know it's status so whether or not the event has occurred yet or is it was lost in some way. These observations can be deleted but they can still contain some useful information which could distort the experiments results. In class we looked at a couple of cases that would help us understand the survival analysis. The first was an actual drug experiment where we looked at days until death occurred and the second was based on days until purchase of a product. We were able to see the central measure of tendencies of the variables where we could interpret the days until death or purchase and how they change over time. The most helpful part was looking at the graphs of the analysis. In this aspect we used the Log Bank, Breslow, and Tarone-Ware comparisons. These all look at relationships between variables at different aspects of the experiment. The Log Bank looks at the end of the lines, Breslow at the beginning, and Tarone-Ware in the middle. If the numbers are below 0.05 then there is a high significance between the lines meaning the relationship is important. This analysis is a great way to be able to tell when an event will occur and could even be used hand in hand with the other regressions depending on how in depth you want the result to be.
The second aspect we focused on this week was survival analysis. This analysis is looking at when an event is most likely to occur and is meant to study until event problems. It may seem similar to logistic regression but it is actually much more complex. It was originally used in drug experiments actually studying the death of subjects which is ironic to the name. The dependent variable in the model is the time until an event which in marketing is usually the purchase. The difference between logistic regression and survival analysis are that logistic regression is used when data is periodic so hen an event that can only occur at regular and specific intervals. Within survival analysis there are censored observations which are ones where we don't know it's status so whether or not the event has occurred yet or is it was lost in some way. These observations can be deleted but they can still contain some useful information which could distort the experiments results. In class we looked at a couple of cases that would help us understand the survival analysis. The first was an actual drug experiment where we looked at days until death occurred and the second was based on days until purchase of a product. We were able to see the central measure of tendencies of the variables where we could interpret the days until death or purchase and how they change over time. The most helpful part was looking at the graphs of the analysis. In this aspect we used the Log Bank, Breslow, and Tarone-Ware comparisons. These all look at relationships between variables at different aspects of the experiment. The Log Bank looks at the end of the lines, Breslow at the beginning, and Tarone-Ware in the middle. If the numbers are below 0.05 then there is a high significance between the lines meaning the relationship is important. This analysis is a great way to be able to tell when an event will occur and could even be used hand in hand with the other regressions depending on how in depth you want the result to be.
Comments
Post a Comment