Author Topic: Merging into an existing network reduces data  (Read 6083 times)

lsf

  • Jr. Member
  • **
  • Posts: 7
    • View Profile
    • Bay Area Mountain Biking Chronicles
Merging into an existing network reduces data
« on: September 18, 2008, 12:26:19 AM »
I am a happy user of TopoFusion and use it several times a week to track mountain bike rides.  I have created a pretty good set of trail maps based on merging individual rides at each location using the Make Network capability.

I noticed something that seems odd.  I went on one of the rides I do pretty frequently and then did a network analysis using the new track (New) and the original merged data (Original) that was created using v3.38.  I had a short bit of new trail in the new track, but the vast majority of New is well represented in Original. In this case, Original is superset of New except for the short new track.  This is confirmed in when the New merged data is compared with Original.  However, I noticed that the New merged file is significantly smaller.  I checked a few things in the GPX files and found:

 Â                                          Original   New 3.38   New 3.41
File size                                3677 KB     1385 KB      931 KB
Waypoints ("<wpt")                   75             92            89
Tracks ("<trk>")                      103            115          109
Track Points ("<trkpt")           43152       16148       10582
TopoFusion #Points               43227       16240       10671

I read the June 2004 ACM paper and followed your explanation of parallel and face reductions.  As I understand the algorithm, the number of points is not reduced below the maximum number from the polyline with the most points when performing a parallel/face reduction.  Since serial reductions are always performed, the Original file should not have any points (vertices) of degree two.  Based on this, I would have expected New to be slightly larger (due to the points in the short new track) than Original, but by only a very small amount.  As the numbers indicate above, New was slightly more than 1/3 the size of Original using v3.38.  I installed 3.41 and the problem is even worse (or better, depending on your point of view) but it seemed to run much faster.

The procedure I used was to select the new track and Original and then run Network Analysis.  I did not do an explicit data reduction, spline or any other steps.  I am using the same reduction and contraction setup parameters in all cases.

Looking closely at the tracks suggests that using re-using a previously generated network does some sort of a spline because the New tracks are much smoother when I zoom in a lot.  This smoothing effect only occurs along the path of my most recent ride (the new track is smoother than the Original track).  If there is some sort of reasonable data reduction being performed, why wasn't this also applied when Original was generated?

Am I really losing resolution using a previous generated network as one of the inputs?  Is it safe to merge data using previously generated networks?

Alan

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 165
    • View Profile
Merging into an existing network reduces data
« Reply #1 on: September 18, 2008, 07:35:01 AM »
Thanks for the detailed post.  I'll see if I can figure out what's going on.

First thing I notice is that you said you were selecting the files you wanted and doing a network analysis.  That's not the way the feature works -- it combines all files that are currently loaded and enabled.

That might be what you meant by selected.  Just wanted to check.

There should be nothing wrong with re-using an existing network as input into a new network procedure.  It's designed to work that way.

Otherwise, I think your understanding of things is correct.  The number of points should only increase when you add new data.  The only exception is with "contractions" when the algorithm deletes short, insignificant spurs.  But unless you had a lot of that (very doubtful) that's not it.

I just experimented making a network that came out with ~10,000 points.  I then added a 7000 point overlapping track to it and the new network came out at 16,300 points.  That's about what is expected.

I can always take a look at your files if we can't figure it out just talking it out.

I don't think anything changed in the network code from v3.38 to v3.41.  A lot changed around v3.35 and such.  One thing that did change in 3.41 is the GPX output routine, but I'm pretty sure it only increased the file size (more fields).

lsf

  • Jr. Member
  • **
  • Posts: 7
    • View Profile
    • Bay Area Mountain Biking Chronicles
Merging into an existing network reduces data
« Reply #2 on: September 18, 2008, 08:45:44 AM »
Alan,

Thanks for the fast response.

You're right: selected = loaded and enabled.  The data files are properly loaded and processed and the resulting tracks show the old and new data as expected.

Good point about the spurs, but I don't think the original data set had a problem there.  Also, since it was created by network analysis, those spurs would have been eliminated when it was created.

Thanks for the offer of additional help.  I'll send you the data files separately.

ScottMorris

  • Administrator
  • Sr. Member
  • *****
  • Posts: 2756
  • TopoFusion Author
    • View Profile
    • http://www.topofusion.com/diary
Merging into an existing network reduces data
« Reply #3 on: September 19, 2008, 09:38:46 AM »
Alright, I had a look at your files this morning and here is what's going on.

The original network, for some reason, has a lot of duplicate points.  As in, even in floating point, they are exactly the same.

The first step of the network algorithm, when combining files, is to eliminate any (successive) duplicate points.  An easy way to speed things up, and they can mess up some geometric calculations too.

Note that the first line of the network output says there are only 14,656 points -- that's after eliminating duplicates.

So, in other words, there's nothing to worry about.

Now, the question is, how did the duplicate points get there in the first place?  Is this a network you have been adding to successively?  If so, how times do you think you've added to it?
Scott Morris - founder and co-author of TopoFusion
email: smorris@topofusion.com

lsf

  • Jr. Member
  • **
  • Posts: 7
    • View Profile
    • Bay Area Mountain Biking Chronicles
Merging into an existing network reduces data
« Reply #4 on: September 19, 2008, 11:46:23 PM »
Thanks for looking into this.  I did not think about duplicate data.

The Original file was created very recently.  It is the result of merging about 36 raw GPX data files.  I don't think any trails had been added into this until I tried to add this new trail.  I re-ran the Network Analysis using v3.41 and had similar (but not exact) results to the previous merge.  With both v3.38 and v3.41, about 40% of the data points are duplicates in the network analysis result based on the full 6 digit precision values in the merged GPX file.  

I noticed that the original input data files from the GPS have 13 digit precision for latitude and 12 digit precision for longitude.  If the internal coordinates maintain this precision, the points will be different while doing the network analysis.  They are written out with 6 digit precision, so the next pass using the previously merged data could see these points as identical due to the loss of precision.  I investigated this hypothesis and it seems to be true.  The files generated by the initial network analysis not only have lots of duplicates (about 2/3 of the points in this case), but they are almost all in sequences of consecutive identical points.  Reading the file in again for an incremental network analysis significantly reduces the duplications and sequencing.  This result seems to fit the hypothesis of different precisions of the incoming coordinates.

The reduction in the number of points does not seem to make a large difference in the appearance of the track in the display.  However, I assume this may have some impact on merging more precise data from future GPS tracks and the parallel/face reductions while the algorithm is running (since the number of data points has been reduced and the minimal geometric distances have changed slightly).

ScottMorris

  • Administrator
  • Sr. Member
  • *****
  • Posts: 2756
  • TopoFusion Author
    • View Profile
    • http://www.topofusion.com/diary
Merging into an existing network reduces data
« Reply #5 on: September 23, 2008, 03:01:09 PM »
That sounds correct.  What software did the original files come from?

TopoFusion does only write out six digits of precision (it reads in as many as are there and stores things as doubles).  I remember thinking about this issue a long time ago.  It may be worth revisiting.

It increases the file size of a GPX by a fair amount (going to 12), but disks are cheap these days, and bandwidth too.  I don't think the average user is going to notice any difference in the look or stats of their tracks, but for some of the more sophisticated features, like the network, it might make sense.

Any thoughts?
Scott Morris - founder and co-author of TopoFusion
email: smorris@topofusion.com

lsf

  • Jr. Member
  • **
  • Posts: 7
    • View Profile
    • Bay Area Mountain Biking Chronicles
Merging into an existing network reduces data
« Reply #6 on: September 23, 2008, 10:23:30 PM »
You caught me.  My usual procedure is to read the data from the GPS using SportTracks, export to a GPX file and then read that into TopoFusion.  I don't know what the precision is that comes out of the GPS (Garmin Edge 305), but I am guessing that is what is being passed through SportTracks.

Your response got me to thinking.  Doing a bit of math reveals that a 0.000001 degree difference (6 digit precision) is not very much.   This translates into 0.1112 m (4.38 inches) for latitudes and 0.0885 m (3.48 inches) for longitudes for the trails I am riding.  12 digit precision suggests a resolution of about 0.1 microns...well under the accuracy of the GPS I am using and much more detail than I need!

Given the distances involved with 6 digit precision, I think it is good enough for my purposes.  Since I don't work at Microsoft, using less disk space is still a virtue.

I think the current solution is OK now that I know why the files sizes shrank so much and that I am not loosing anything that seems important.  Thanks!

ScottMorris

  • Administrator
  • Sr. Member
  • *****
  • Posts: 2756
  • TopoFusion Author
    • View Profile
    • http://www.topofusion.com/diary
Merging into an existing network reduces data
« Reply #7 on: September 24, 2008, 09:50:02 AM »
Ha!  Using anything but TF is pure blasphemy!

Only joking.  All the various mapping solutions have their advantages and disadvantages, IMO.

Thanks for running the numbers.  We did a similar calculation some years ago and came to the same conclusion.  

For reference, the GPX files written out by Garmin themselves (on the SD card) also use only 6 digits of precision.

Now, when you start talking about averaging and splitting tracks there might still be some argument to adding a few more digits.  But I'm inclined to think it's overkill.  If points really are that close you may as well eliminate them as duplicates both to save disk space and also processing time (for a network calculation especially).

Thanks for the thoughtful discussion.
Scott Morris - founder and co-author of TopoFusion
email: smorris@topofusion.com