Behavior of virtualtex script for very large textures

timcrews · Post #1by **timcrews** » 05.01.2004, 16:34

Hello:

I am using virtualtex under Windows XP SP1, Cygwin, bash shell, 1.25GB of RAM. I have modified the virtualtex script to use bash instead of zsh, and to allow 800M of RAM usage instead of 80.

I am using it to chop up a 32K .ppm image. (1.57GB in size). I am using a tile size of 1K.

The image being diced is on a hard drive with 50GB free.

My system drive (C:) has 4GB free. The swap file on that drive is configured to range from 2GB to 4GB. I also have additional swap file space configured on another huge drive with plenty of free space.

I am observing three different behaviors I would like to discuss:

###############

When I run virtualtex, although the Task Manager still shows plenty of RAM available (for example, 700MB), I can see that my C: drive is losing free space. Before virtualtex is finished, all of the 4GB of free space on the C: drive is running out. Can anyone explain that? It would seem that Windows is trying to allocate virtual memory from the hard disk, even though there is still a huge amount of RAM available. Also, I can see no evidence that Windows is attempting to use the additional swap space available on the other huge drive.

################

For the 32K texture, I have run virtualtex a total of four times. The second time, it produced all 512 diced images as expected. But the first, third, and fourth times, it seems to have given up before it was finished. There are usually about 40 images missing from the output. For each missing image, the virtualtex script emits a warning that it could not stat the .out file corresponding to the missing tiles.

If I try to run it on a 64K texture (6.3GB in size), only one image tile comes out. All of the rest are missing.

################

In either case, when I initially start virtual tex, there is no disk activity, processor utilization is below 5%, and I can see that the identify and convert processes are not using any significant amount of memory (2 or 3 MB). However, the script just sits there for an hour. What is it doing? What is it waiting for?

##################

Is there a way I can use the netpbm tools instead of ImageMagick in the virtualtex script? What do the ImageMagick identify and convert programs do? Are there equivalent netpbm programs? The netpbm programs are not having difficulty dealing with the huge files.

Thanks for any advice

Tim Crews

##################

Buzz · Post #2by **Buzz** » 05.01.2004, 18:42

Hi Tim,

I have seen comparable behaviour under Linux. I can not go beyond 16k tiles. During execution, Imagemagick fills the /tmp directory with files, and fills the partition completely. I too have swap space and RAM left. I guess I can fix it by increasing my Linux partition. What I ended up doing is processing 4 sub parts of a 32k texture, and renaming the tiles that are produced by it (shifting to the right, down or right & down).

timcrews · Post #3by **timcrews** » 05.01.2004, 20:18

I have modified virtualtex to accept two offsets (one for i and one for j) on the command line (in place of the current e/E/w/W single argument). This will allow virtualtex to be used to dice up a texture that is already somewhat diced. For example, a 64K texture could be pre-diced into 4x2 16Kx16K textures, and then each of these pre-diced textures could be fed to the new version of virtualtex, with the appropriate offset arguments provided on the command-line, and all of the resulting tx_?_? filenames will be right. So you will not have to manually rename the files that virtualtex produces.

This "pre-dicing" is probably not such a burden for very large textures, since most of these large textures are already somewhat "pre-diced" in their raw form. Since they have to be manually combined anyway during the process of creating the virtual texture, we can just make sure that they are manually combined into sub-textures small enough to be handled by virtualtex.

I would still like to get to the bottom of why virtualtex chokes on these large files, even though plenty of RAM and disk space and processor time are still available. I may experiment with using pamdice from the netpbm library instead of convert from ImageMagick.

Tim

Buzz · Post #4by **Buzz** » 05.01.2004, 20:55

Nice! Can you make it available? I have only created very simple/non-flexible scripts...

timcrews · Post #5by **timcrews** » 05.01.2004, 21:01

netpbm library: pamdice
ImageMagick library: convert

Fridger's virtualtex script uses the ImageMagick library to cut the supplied main texture into lots of little textures. It exhibits the problems and limitations that I alluded to in the first post in this thread. These limitations are not due to Fridger's script, of course, but due to the convert program.

I experimented with using pamdice instead of convert. pamdice was able to dice up a 32K by 16K texture without difficulty. The whole process was completed in 5 minutes. I never ran out of disk space, I never had to wait an hour while the program did (seemingly) nothing. No intermediate files are created in the system temp directory. You can see the diced-up files being built one row at a time directly in the specified output directory.

The 64K-resolution texture (nearly 7GB in size) was diced in 24 minutes without complaints or groaning.

So, I definitely think I will be sticking with pamdice for myself. There is still the problem that pamdice outputs filenames that use different conventions than those expected by the Virtual Texture code in Celestia. A script will still have to be written that massages the output filenames. I think this will be my approach, rather than starting with virtualtex.

The other drawback (for some) is that pamdice must take a netpbm-compatible file format as its input (i.e. not JPG or PNG, but PPM or PGM). This is not a big deal in my case, since my procedure for creating virtual textures involves converting to this format as one of the very first steps. The raw data for these huge textures usually starts out as a raw binary file anyway, so a format conversion is necessary at some point no matter what. The conversion from the raw format to a netpbm-compatible format is real fast.

In the end, it appears that the only real time-consuming task for producing huge virtual textures (other than downloading the raw data to start with) is the resize step. All of the other steps, including dicing, are really very reasonable (5 minutes or so for very large textures, 30 minutes for gargantuan textures.)

Tim Crews

wcomer · Post #6by **wcomer** » 05.01.2004, 22:40

Hi,

I had the same experience. To make my 64k and 32k VT's I used pamdice and convert in my scripts as well. I know Fridger has been very successful using entirely imagemagik libraries but in my experience it has always choked on large images.

Using pamdice is very efficient in terms of disk space and bandwidth requirements.

cheers,
Walton

Buzz · Post #7by **Buzz** » 05.01.2004, 23:31

Wow, this looks promising!

maxim · Post #8by **maxim** » 06.01.2004, 12:13

timcrews wrote:There is still the problem that pamdice outputs filenames that use different conventions than those expected by the Virtual Texture code in Celestia. A script will still have to be written that massages the output filenames.

You may like to use '1-4a Rename' for that purposes. Since I work on texures for celestia it has done some good jobs even on weird renaming tasks spanning several subdirectories. It's freeware an can be found on http://www.1-4a.com/rename/

maxim

timcrews · Post #9by **timcrews** » 06.01.2004, 15:55

Here is a snippet of a bash script that renames all of the files in a directory. The files were output by pamdice. It assumes that "yx" was used as the -outstem argument to pamdice. This could easily be changed. Really, it ought to be parameterized.

Code: Select all

#! /usr/bin/bash

function PamDiceNameToVtName {
   # remove all yx prefixes
   for file in yx*
   do
      new=`echo ${file} | sed s/yx//g`
      mv ${file} $new
   done

   # remove all zero-prefixes in first and second number
   for file in _*
   do
      new=`echo ${file} | sed s/_0/_/g`
      # if there are no leading zeros, no need to rename
      if [ "$file" != "$new" ]; then
         mv ${file} $new
      fi
   done

   # add tx prefix and swap first number and second number
   for file in _*
   do
      new=`echo ${file} | sed -r s/\([0-9]+\)_\([0-9]+\)/\\\2_\\\1/g`
      mv ${file} tx$new
   done

}

I am not a script expert. One thing I had to fight was the possibility that the loop "for file in _*" was returning some of the files that had already been renamed. (I.E. the file list is dynamically-generated instead of generated once on entry to the loop.) This was resulting in the same file being double-processed in the loop. For example, _00_00.ppm would be processed once in the second loop to produce _0_0.ppm, but then the (already-renamed) file _0_0.ppm would be processed in the loop again, leaving __.ppm. At least, I think that was what was happening. But I also think I have fixed it now.

Getting the back-references in the third sed invocation to work properly was also very difficult. The triple-backslash (\\\) took a while to figure out. I still don't understand why it was necessary.

Here's an example of how I use this function in a script that takes a bunch of power-of-2-sized ppm files in a sibling directory named 4Normalize, dices them into a bunch of levelX directories using pamdice, and fixes the resulting filenames to be usable by Celestia's virtual texture loader.

Code: Select all

# insert above function here, ommitted for brevity

echo Dicing level 0
pamdice ../4Normalize/2k.ppm -outstem=level0/yx -width 1024 -height 1024 -verbose
echo Fixing filenames for level 0
cd level0; PamDiceNameToVtName; cd ..

echo Dicing level 1
pamdice ../4Normalize/4k.ppm -outstem=level1/yx -width 1024 -height 1024 -verbose
echo Fixing filenames for level 1
cd level1; PamDiceNameToVtName; cd ..

echo Dicing level 2
pamdice ../4Normalize/8k.ppm -outstem=level2/yx -width 1024 -height 1024 -verbose
echo Fixing filenames for level 2
cd level2; PamDiceNameToVtName; cd ..

echo Dicing level 3
pamdice ../4Normalize/16k.ppm -outstem=level3/yx -width 1024 -height 1024 -verbose
echo Fixing filenames for level 3
cd level3; PamDiceNameToVtName; cd ..

echo Dicing level 4
pamdice ../4Normalize/32k.ppm -outstem=level4/yx -width 1024 -height 1024 -verbose
echo Fixing filenames for level 4
cd level4; PamDiceNameToVtName; cd ..

echo Dicing level 5
pamdice ../4Normalize/64k.ppm -outstem=level5/yx -width 1024 -height 1024 -verbose
echo Fixing filenames for level 5
cd level5; PamDiceNameToVtName; cd ..

Feel free to report back if you have any problems!

Tim Crews

abramson · Post #10by **abramson** » 06.01.2004, 17:56

Hi all.

I also have seen that it is no picnic to virtualize a very large texture. And, as Tim, I observed that it is better to pre-digest the file. It is a lot faster, also, since much time is used just reading and writing gigafiles. For my 64k Blue Marble, I first cut horizontal stripes, and then cut those in the final squares. All done with the convert utility (Image Magick) and ad hoc bat or perl scripts.

Cheers,

Guillermo

Buzz · Post #11by **Buzz** » 06.01.2004, 20:04

Thanks Tim, I'll give a try (soon)