12/09/2010

12-09-10 - Rank Lookup Error

Trying some other methods of testing to make sure the function fit isn't screwing me up too much.

Spearman rank correlations (it's just the Pearson correlation on sort ranks) :


mse.txt                 0.66782
square_ms_ssim_y.txt    0.747733
scielabL1.txt           0.855355
scielabL2.txt           0.868207
vif.txt                 0.87631
ms_ssim_bad_down.txt    0.880403
ms_ssim.txt             0.89391
iw_ms_ssim.txt          0.901085
wsnr.txt                0.90905
mydctdelta.txt          0.932998
my_psnr_hvs_m.txt       0.94082
mydctdeltanew.txt       0.944086

I just thought of a new way to check the fit for this kind of scenario which I think is pretty cool.

You have two vectors of values. One vector is the original which you wish to match, the other is output by your program to try to match the original vector. The problem is that your program outputs in some other scale, different units which may involve some warping of the curve, and you don't know what it is. You wish to find how close your program is to the target without worry about matching that curve.

Well, one way is "rank lookup error" :


given vectors "orig" and "test"

find "test_rank" such that
  r = test_rank[i] means item i is the r'th in sorted order in the vector test

find "sorted_orig" = orig, sorted

Sum{i} :
  err += square( orig[i] - sort_orig[ test_rank[ i ] ] )

that is, the fit value for mapping from test's scale to orig is to find the sort index within test, and lookup the value in the sorted list of originals.

Obviously this isn't quite ideal; it does handle ties and near-ties pretty well though (better than Spearman/Kendall, because you get less error contribution when you get the rank wrong of two items with very similar value). Most importantly it avoids all the fidgetty function fitting stuff.

Here are the scores with "rank lookup rmse" :


mse.txt                 1.310499
square_ms_ssim_y.txt    1.090392
scielabL1.txt           0.853738
scielabL2.txt           0.835508
ms_ssim_bad_down.txt    0.716507
ms_ssim.txt             0.667152
wsnr.txt                0.643821
vif.txt                 0.591508
iw_ms_ssim.txt          0.575474
mydctdelta.txt          0.548946
my_psnr_hvs_m.txt       0.543057
mydctdeltanew.txt       0.494918

if nothing else it's a good sanity check for the fitting stuff.

Also with rank-lookup-error you don't have to worry about transforms like acos for ssim or undo-psnr or anything else that's monotonic.

For comparison, these were the fit scores :


mse.txt                 1.169794
square_ms_ssim_y.txt    1.005849
scielabL1.txt           0.819501
scielabL2.txt           0.808595
ms_ssim_bad_down.txt    0.690689
ms_ssim.txt             0.639193
wsnr.txt                0.632670
vif.txt                 0.576114
iw_ms_ssim.txt          0.563879
my_psnr_hvs_m.txt       0.552151
mydctdelta.txt          0.548938
mydctdeltanew.txt       0.489756

mydctdeltanew is a new one; it just uses mydctdelta for only the AC part of the DCT (excludes the DC which is gross luma differences), and then it adds back on the gross luma difference using a form similar to IW-MS-SSIM (that is, using large filters and then using both variance masking and saliency boosting).

2 comments:

Per Vognsen said...

"Obviously this isn't quite ideal; it does handle ties and near-ties pretty well though (better than Spearman/Kendall, because you get less error contribution when you get the rank wrong of two items with very similar value)."

Squaring the error seems combinatorially perverse.

In general, wouldn't you ideally want to use a measure of the sequence's relative sortedness? Classical choices include the inversion number, the crossing number, the number and length of runs, the minimum number of adjacent transpositions to sort the sequence, etc.

These all reflect different aspects of relative sortedness you might want to capture. In your application, where it's just a basic sanity check, it probably doesn't matter.

cbloom said...

"In general, wouldn't you ideally want to use a measure of the sequence's relative sortedness?"

The point I've been trying to make is that that is not what I want to measure. Kendall and Spearman *are* measured of relative sortedness, and I'm arguing that since sorting is not what you care about, that it can give the wrong picture of which metric is best.

It appears that "rank lookup mse" does in fact give ratings to the metrics which are much closer to the order of the rating I actually care about (which is the ability to fit the scores).

old rants