I have 50 experimentally tested ligands and I know which one is a good inhibitor of GSK-3 and which one is a bad inhibitor of GSK-3. I did ligand docking for all of them separately using the protein-ligand docking protocol learned from the 2020 virtual Rosetta workshop with 10000 nstruct; I sorted the outputs by interface_delta for each ligand and I picked the best interface_delta score for each ligand and then I sorted the 50 ligands by their best interface_delta score. Should I see any correlation between the experimental results and the Rosetta docking results. In other words, good inhibitors have lower interface_delta scores. I was wondering if it is OK to compare different ligands using Rosetta ligand docking. Also, should I look at interface_delta or something else like total score?