I need a stripper

bug_girl · Jan 27, 2011

I have (again!) inherited a majorly dysfunctional website.

Is there a nice utility somewhere that will strip out all the <div> and <span> elements?

Every. Freakin. Paragraph. has it's own individual formatting.

Sigh.

Thanks!

Varanid · Jan 27, 2011

Very disappointed in this thread. I expected photos.

Baloney · Jan 27, 2011

You could write a utility to parse the HTML and then chop out the tags the don't want? Can you code?

DCLimey · Jan 27, 2011

I've had to do this in the past and here is what I usually do.

Using any text editor (even Dreamweaver could probably handle this), simply do a Search and Replace (often that's CTRL+H in Windows) and just search for "<DIV>", "</DIV>" and whatever other tags you want to get rid of, and simply replace them with nothing.

That should do it.

Baloney · Jan 27, 2011

DCLimey said:
...just search for "<DIV>", "</DIV>" and whatever other tags you want to get rid of, and simply replace them with nothing.

I think the intent was removing the open/close tags and everything in between, but the OP would have to explain some more.

Bell · Jan 27, 2011

bug_girl said:
I have (again!) inherited a majorly dysfunctional website.

Is there a nice utility somewhere that will strip out all the <div> and <span> elements?

Every. Freakin. Paragraph. has it's own individual formatting.

Sigh.

Thanks!

If you use a text editor you can search, for example, for "<div>" and replace with "" (no character)

Eta: What DCLimey said (welcome to the forums by the way, DCLimey)

bug_girl · Jan 28, 2011

Yeah, I guess I could do it in a text editor--except there's still a lot of manual work there.
Unless...I could have it search for <span*> maybe, and then take it all out.

That might work.

Also? Photo:

bug_girl · Jan 28, 2011

Oh, and to answer FSM's question--i can code, I just don't like to. Especially when someone out there has probably already written a utility to do just this.

Everyone has their favorite little tools to use--I like asking the JREF hive mind.
Also, after 7pm, my brain is shot and my google-fu is weak.

bokonon · Jan 28, 2011

If your text editor can handle regular expressions (UltraEdit does this) you can remove generic tags. Otherwise, I'd say write something in a language that does regular expressions (Perl would be my choice, but even Java has them now).

Paul C. Anagnostopoulos · Jan 28, 2011

If you give me the files and a description of exactly what you want, I can create a Hypertran file to do the job in a few minutes.

~~ Paul

quixotecoyote · Jan 28, 2011

The Norseman · Jan 28, 2011

bug_girl said:
I need a stripper

Don't we all?

Eric D · Jan 28, 2011

FSM76 said:
Very disappointed in this thread. I expected photos.

Seconded.

Slimething · Jan 28, 2011

Can we get back to stippers, please? I was going to volunteer! :blush:

PhantomWolf · Jan 28, 2011

Open your webpage in your FTP tool, C & P it into Word, or Writer if you use OO, then use your Find and Replace to replace them all with blanks, C & P back into and save.

jimtron · Jan 28, 2011

I'm sure there's a way to do this without manually search/replacing file by file. If you happen to be a Mac user, I think this excellent web authoring application will help:
http://www.panic.com/coda/

Find across files. You can now find and replace text across multiple local files — open files, files in a specific folder, or the files in your "site" — using the same sleek, inline search bar you know and love.

ETA:

This might help too:

http://www.thefreecountry.com/programming/searchandreplace.shtml

JoelKatz · Jan 28, 2011

bug_girl said:
I have (again!) inherited a majorly dysfunctional website.

Is there a nice utility somewhere that will strip out all the <div> and <span> elements?

Every. Freakin. Paragraph. has it's own individual formatting.

Sigh.

Thanks!

Use whichever tool you're most comfortable with. This can be done with perl, sed, and many other tools. You can probably do (off the top of my head, untested):

find . -name \*.htm\* |
while read FNM; do cat "${FNM}" |
sed -e "s:<[Dd][Ii][Vv]>::g" |
sed -e "s:<[sS][pP][aA][nN]>::g" |
sed -e "s:</[Dd][Ii][Vv]>::g" |
sed -e "s:</[sS][pP][aA][nN]>::g" > "${FNM}.tmp";
mv "${FNM}" "${FNM}.bak"; mv "${FNM}.tmp" "${FNM}";
done

This is not the most efficient way by far, but how often do you need to do this? On Windows, you can do this from a Cygwin shell. Note that this makes some assumptions -- for example, it will fail if your files may start with a '-' character. It also assumes you have no intentionally malicious filenames.

This won't catch things like "< div >".

grmcdorman · Jan 28, 2011

Notepad++ is another Windows-based free (OSS) editor that will do regular-expression search & replace, and can do it across multiple files/directories on disk (i.e. not loaded into the editor). http://notepad-plus-plus.org/

JoelKatz, it is of course trivial to fix your REs to handle spaces in the tags. The -i option to sed will also avoid the mv, but doesn't keep a backup. There is also a better way to make it case insensitive:

Code:

cp -i "${FNM}" "${FNM}.bak"
sed -e "s:</? *span *>::gi" -i "${FNM}"

Also, if you don't want to bother with Cygwin, but want sed & friends, got to gnuwin32.sourceforge.net.

ETA: Don't pipe sed commands together like that; you can provide multiple -e options to sed:

Code:

sed -e "s:</? *div *>::gi" -e "s:</? *span *>::gi" -i "${FNM}"

By the way, here's the Windows command line equivalent, if you download and install sed from gnuwin32:

Code:

  for /r %f in (*.htm *.html) do (
    copy "%f" "%f.bak"
    sed -e "s:</? *div *>::gi" -e "s:</? *span *>::gi" -i "%f"
)

ETA2: Yuck. sed doesn't have the RE ? operator, it seems. Use * instead, although it's less restrictive, or multiple REs as originally suggested.
Here's a RE that will also catch options on the tags (but needs spaces after the div:

Code:

< *div [^>]*>

bug_girl · Feb 2, 2011

Wow--thanks everyone!

tyr_13 · Feb 2, 2011

I thought this was going to be about a wire stripper, which I desperately needed when running new ethernet cable to the barn. Had to make do with a knife.

I need a stripper

Master Poster

Master Poster

Muse

New Blood

Muse

Penultimate Amazing

Master Poster

Master Poster

Illuminator

Nap, interrupted.

Howling to glory I go

Meandering fecklessly

Muse

Illuminator

Penultimate Amazing

Illuminator

Recipient of a Custom Title

Graduate Poster

Master Poster

Penultimate Amazing