This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

We've moved the forum!

Please use forum.silverstripe.org for any new questions (announcement).
The forum archive will stick around, but will be read only.

You can also use our Slack channel or StackOverflow to ask for help.
Check out our community overview for more options to contribute.

Migrating a Site to Silverstripe /

What you need to know when migrating your existing site to SilverStripe.

Moderators: martimiz, Sean, Ed, biapar, Willr, Ingo, swaiba

StaticSiteImporter previews all HTML code

4 Posts 2631 Views

suntrop

Community Member, 141 Posts

30 July 2010 at 7:20pm

Hi there.
I want to turn a static website (about 150 pages) into a SS site. I installed SS to test the StaticSiteImporter (latest trunk) and it works - but I don't know if it is as it should be.

When I check "Preview the content that will be extracted" I get many pages listed with the extracted content.
But the content looks like:

<?xml version="1.0" encoding="utf-8"?>
<!-- ra -->
<!DOCTYPE html ....
all HTML code

} catch(err) {}
//]]>
</script><!-- InstanceEnd -->
</body>
</html>

As I understand this feature it should only show me the grabbed content, right?

That is what I defined in mysite/_config.php

StaticImporter::set_url("www.example.com/");
StaticImporter::set_allowed_extensions(array('php','html','jpg','pdf'));
StaticImporter::set_rules(
		array(
			// Default rules for all other URLs
			'conditions' => array(),
			'fields' => array(
				'Title' => array(
					'xpath' => array(
						'//h1'
					),
					'exclusive' => 1
				),
				'Hierarchy' => array(
					'xpath' => array(
						'//h2[contains(@class, "location")]/a/@href',
					),
					'exclusive' => 1
				),
				'Content' => array(
					'xpath' => '//div[contains(@id, "content")]',
					'includeMatchedTag' => 0
				)
			),
			'exclusive' => 1
		)
	);

Targeted website: http://bit.ly/b5yrHV

Is the XPath wrong or can I import?

suntrop

Community Member, 141 Posts

3 August 2010 at 7:59am

After consulting various forums the XPath (all 50 ;)) isn't the failure.

What could be the problem? Why do I get the whole DOM displayed in the very, very tiny textarea?
In which _config.php do I have to put in the StaticImporter::set_rules?

Appreciate all help I can get to this!

suntrop

Community Member, 141 Posts

12 August 2010 at 6:28am

I made some changes, rewinded, made others but nothing works.
So I clicked to insert all in the database but unfortunately that doesn't work either.
After one page I exits with an "website error" message. The sitetree has a new item ImportedFiles>Folder>Filename but its content is empty. The page name is almost correct.

Bambii7

Community Member, 254 Posts

18 August 2010 at 3:14pm

Hmmmm I haven't used it before, read about it though. For 150 pages I'd sit down for a few hours and copy paste them all, only to guarantee formatting of pages.
I think you'll need to target the content div or container. From memory there was a method of doing this. Other wise it has no way of knowing what is content or what is a side bar or menu element.

Return to top