What sort of ROC curves are you trying to generate, and at what stage are you having issues?
Your standard receiver operating characteristic curve has the false positive rate on the x-axis and the true positive rate on the y-axis. So to plot it, you need to decide 1) what are the individuals on which you're doing the classification 3) how do you determine what's a predicted positive and 3) what's your gold standard for actual positives. Depending on how you answer those questions depends on how you're going to construct your ROC curve.
One possibility is if the individuals you're classifying are multiple different drugs. That is, you have a number of different compounds and you want to classify them as binder/not a binder. To do this, you'll need experimental results for each drug to serve as your gold standard for real positives. Setting what counts as a "real positive" or not based on the experimental results is up to you.
Then once you have the results of docking runs for each of the drugs, you need to figure out how to convert the results into predicted positives and negatives. One of the most straightforward way of doing that is just looking at the lowest interface energy (e.g interface_delta_X) for each ligand, and take this to be your value. (Normally would pull this data out of the scorefiles with an ad hoc combination of sort and awk commands, but you could do the sorting and combination with Python or R scripts as well - there really isn't a pre-made script for it, though.) Depending on what you want, you can get more sophisticated. For example, by throwing out outliers in total_score or packstat, etc. or by using a reweighted combination of interface energy and surface area burial or by doing some sort of ensemble average of multiple low energy structures. The Davis et al. approach is to take the top fraction (~10%) by total_score, and then use the interface energy of the structure with the lowest interface energy as a proxy for the binding energy. That's a good starting point, if you don't have other ideas.
Standard ROC plotting techniques (like ROCOR) normally take their input as continuously valued predictions, each matched pairwise with a true/false categorization. The ROC cruve will be drawn by adjusting a threshold across the range of values, and using it as the separator between predicted positive and predicted negative. The different values of the threshold give you the different points on the ROC curve. (More advanced approaches are also possible, but that's the basic character.)