• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

I need a stripper

bug_girl

Master Poster
Joined
Nov 30, 2003
Messages
2,994
I have (again!) inherited a majorly dysfunctional website.

Is there a nice utility somewhere that will strip out all the <div> and <span> elements?

Every. Freakin. Paragraph. has it's own individual formatting.

Sigh.

Thanks!
 
You could write a utility to parse the HTML and then chop out the tags the don't want? Can you code?
 
I've had to do this in the past and here is what I usually do.

Using any text editor (even Dreamweaver could probably handle this), simply do a Search and Replace (often that's CTRL+H in Windows) and just search for "<DIV>", "</DIV>" and whatever other tags you want to get rid of, and simply replace them with nothing.

That should do it.
 
...just search for "<DIV>", "</DIV>" and whatever other tags you want to get rid of, and simply replace them with nothing.

I think the intent was removing the open/close tags and everything in between, but the OP would have to explain some more.
 
I have (again!) inherited a majorly dysfunctional website.

Is there a nice utility somewhere that will strip out all the <div> and <span> elements?

Every. Freakin. Paragraph. has it's own individual formatting.

Sigh.

Thanks!

If you use a text editor you can search, for example, for "<div>" and replace with "" (no character)


Eta: What DCLimey said (welcome to the forums by the way, DCLimey)
 
Last edited:
Yeah, I guess I could do it in a text editor--except there's still a lot of manual work there.
Unless...I could have it search for <span*> maybe, and then take it all out.

That might work.

Also? Photo:
Kit1.jpg
 
Oh, and to answer FSM's question--i can code, I just don't like to. Especially when someone out there has probably already written a utility to do just this.

Everyone has their favorite little tools to use--I like asking the JREF hive mind.
Also, after 7pm, my brain is shot and my google-fu is weak.
 
If your text editor can handle regular expressions (UltraEdit does this) you can remove generic tags. Otherwise, I'd say write something in a language that does regular expressions (Perl would be my choice, but even Java has them now).
 
If you give me the files and a description of exactly what you want, I can create a Hypertran file to do the job in a few minutes.

~~ Paul
 
Open your webpage in your FTP tool, C & P it into Word, or Writer if you use OO, then use your Find and Replace to replace them all with blanks, C & P back into and save.
 
I'm sure there's a way to do this without manually search/replacing file by file. If you happen to be a Mac user, I think this excellent web authoring application will help:
http://www.panic.com/coda/

Find across files. You can now find and replace text across multiple local files — open files, files in a specific folder, or the files in your "site" — using the same sleek, inline search bar you know and love.
ETA:

This might help too:

http://www.thefreecountry.com/programming/searchandreplace.shtml
 
Last edited:
I have (again!) inherited a majorly dysfunctional website.

Is there a nice utility somewhere that will strip out all the <div> and <span> elements?

Every. Freakin. Paragraph. has it's own individual formatting.

Sigh.

Thanks!
Use whichever tool you're most comfortable with. This can be done with perl, sed, and many other tools. You can probably do (off the top of my head, untested):

find . -name \*.htm\* |
while read FNM; do cat "${FNM}" |
sed -e "s:<[Dd][Ii][Vv]>::g" |
sed -e "s:<[sS][pP][aA][nN]>::g" |
sed -e "s:</[Dd][Ii][Vv]>::g" |
sed -e "s:</[sS][pP][aA][nN]>::g" > "${FNM}.tmp";
mv "${FNM}" "${FNM}.bak"; mv "${FNM}.tmp" "${FNM}";
done

This is not the most efficient way by far, but how often do you need to do this? On Windows, you can do this from a Cygwin shell. Note that this makes some assumptions -- for example, it will fail if your files may start with a '-' character. It also assumes you have no intentionally malicious filenames.

This won't catch things like "< div >".
 
Last edited:
Notepad++ is another Windows-based free (OSS) editor that will do regular-expression search & replace, and can do it across multiple files/directories on disk (i.e. not loaded into the editor). http://notepad-plus-plus.org/

JoelKatz, it is of course trivial to fix your REs to handle spaces in the tags. The -i option to sed will also avoid the mv, but doesn't keep a backup. There is also a better way to make it case insensitive:
Code:
cp -i "${FNM}" "${FNM}.bak"
sed -e "s:</? *span *>::gi" -i "${FNM}"

Also, if you don't want to bother with Cygwin, but want sed & friends, got to gnuwin32.sourceforge.net.

ETA: Don't pipe sed commands together like that; you can provide multiple -e options to sed:
Code:
sed -e "s:</? *div *>::gi" -e "s:</? *span *>::gi" -i "${FNM}"
By the way, here's the Windows command line equivalent, if you download and install sed from gnuwin32:
Code:
  for /r %f in (*.htm *.html) do (
    copy "%f" "%f.bak"
    sed -e "s:</? *div *>::gi" -e "s:</? *span *>::gi" -i "%f"
)

ETA2: Yuck. sed doesn't have the RE ? operator, it seems. Use * instead, although it's less restrictive, or multiple REs as originally suggested.
Here's a RE that will also catch options on the tags (but needs spaces after the div:
Code:
< *div [^>]*>
 
Last edited:
I thought this was going to be about a wire stripper, which I desperately needed when running new ethernet cable to the barn. Had to make do with a knife.
 

Back
Top Bottom