• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Pattern recognition

Ed said:
If I have bitmaps that contain the same number of rows and columns, how close would I get to a match if I did a series of correlations and simply ranked the r^2's?
It'd be worth a look as it's so simple to implement.

Bear in mind that any slight rotation or lateral displacement of the marks could introduce large errors if you do a point-by-point comparison. For example, at either side of the edge of the mark, you might end up comparing dark pixels in one mark (inside the mark) with light pixels on another (just outside the mark).

Also, the more ornate the mark, the more surface it is likely to have, and the more this sort of edge effect would show up. For some marks, this would mean that the r^2 would always be very low, and you would never be able to make a successful prediction for it.

You'll have to think about how you make a decision based on the rankings, too. If your top 5 correlations predict as follows:

1. Mark A.
2. Mark B.
3. Mark B.
4. Mark B.
5. Mark A.

how would you decide which of A or B it is? By the top ranked result only? By some consensus based on the top 3? The top 5? Just something to think about.
 
did an experiment..

I constructed 7 vectors of 5000 rows and populated them with random numbers using Excel.

I then constructed a correlation matrix and ranked the pairwise correlations. I then replaced the cells in vector "g" with n cells of vector "a". At the point where I replaced 3% of the cells it achieved rank 1.

Forced
Common
Cells r2 rank(out of 21)
0 0.000% 0.001 10
10 0.200% 0.002 10
25 0.500% 0.006 8
50 1.000% 0.011 6
75 1.500% 0.017 3
100 2.000% 0.021 3
150 3.000% 0.028 1
200 4.000% 0.039 1

I can see a number of problems with this dewey the dunce analysis. The foremost being that the putitive bipmaps line up perfectly. Then again, with modern computing engines, one could shift things. Suppose you preprogrammed 10,000 shifts, up and down and diagonally and did a correlation at each pass. The data would tell you what the right shift is to get the best match and that shift could be tested for reasonableness to avoid pesky boundry conditions.

Alternatively, one could rip off the idea that they use for commercial identification in various reporting services, exmine n (where n is arbitrarily large) "footprints" of some arbitrary size. The "footprint" measure would simply be an average of the 0-1 values. They get like 99% with that over gazillions of commercials....they too compare to a database and if there is not match (opportunistically defined) they have a punch of people that view it and id it that way.

Again, perfection is unnecessary, good enough is.

Thoughts?
 

Back
Top Bottom