Jump to content
Sign in to follow this  

Anyone good at statistics? - Processes for Strange Data

Recommended Posts

Hello there, thank you so much for clicking this!

In my investigation I'm comparing rating for movies on imdb and their budget. I found the top 50 most popular movies on imdb, and the bottom most rated, so I have 50 data points with ratings such as 9.3, 9.2, etc. and 50 data points with ratings like 1.2, 1.3 etc.

I have performed my chi squared test, and found they are dependent.

However, I can't create a scatter plot for them, as I only have really high ratings and really low ratings (and I can't get any middle ones!) It looks too odd, with all the data grouped on different sides of the graph. As a result, I can't calculate correlation coefficient, or pearsons, or line regression, or anything like that! I can't get help from my teachers as we're on holiday and it's due after that.

Can anyone think of a way of helping me? Is there a different kind of graph I can use or something? I've attached the mess of a scatter plot I have now in case that helps.

tumblr_mvfq2fxgwh1qd3tz7o1_500.png

Note: don't want to be breaking any rules here - not looking for any answers or solutions or anything of the sort - I am just in need of some pointers as to what I should attempt myself. Thanks a lot!

Edited by Alpaca

Share this post


Link to post
Share on other sites

Well, your question is a little bit hard to understand. I dont really get what you are trying to demonstrate or find out through this investigation... Is it to see if there is a correlation between the budget of a movie and its rating?

I think you are saying that your problem is that you cannot find any movies that are rated in the middle? Well, it's true that most movies are even really bad or really good, but if you dont get middle values it is going to be very difficult for you to show trend lines or do the regression line... Maybe you could try breaking the line in the middle, although I dont know if it would work.

So basically you skip the middle values (you could argue that they are irrelevant to your investigation as there is no sufficient data)... but I would ask my teacher about that.

Good luck! :)

Share this post


Link to post
Share on other sites

Why don't you make two separate graphs? one for the bottom 50 and one for the top 50. Then you can make your x-axis have smaller intervals so like 1.0, 1.2, 1.4 etc. And you will get a spread of data

All the best :)

Edit: My 200th Post!

Edited by The Rainbow Connection
  • Like 1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.