Hello there, thank you so much for clicking this!

In my investigation I'm comparing rating for movies on imdb and their budget. I found the top 50 most popular movies on imdb, and the bottom most rated, so I have 50 data points with ratings such as 9.3, 9.2, etc. and 50 data points with ratings like 1.2, 1.3 etc.

I have performed my chi squared test, and found they are dependent.

However, I can't create a scatter plot for them, as I only have really high ratings and really low ratings (and I can't get any middle ones!) It looks too odd, with all the data grouped on different sides of the graph. As a result, I can't calculate correlation coefficient, or pearsons, or line regression, or anything like that! I can't get help from my teachers as we're on holiday and it's due after that.

Can anyone think of a way of helping me? Is there a different kind of graph I can use or something? I've attached the mess of a scatter plot I have now in case that helps.

Note: don't want to be breaking any rules here - not looking for any answers or solutions or anything of the sort - I am just in need of some pointers as to what I should attempt myself. Thanks a lot!

Well, your question is a little bit hard to understand. I dont really get what you are trying to demonstrate or find out through this investigation... Is it to see if there is a correlation between the budget of a movie and its rating?

I think you are saying that your problem is that you cannot find any movies that are rated in the middle? Well, it's true that most movies are even really bad or really good, but if you dont get middle values it is going to be very difficult for you to show trend lines or do the regression line... Maybe you could try breaking the line in the middle, although I dont know if it would work.

So basically you skip the middle values (you could argue that they are irrelevant to your investigation as there is no sufficient data)... but I would ask my teacher about that.

Good luck!

Why don't you make two separate graphs? one for the bottom 50 and one for the top 50. Then you can make your x-axis have smaller intervals so like 1.0, 1.2, 1.4 etc. And you will get a spread of data

All the best

