Sorry to necropost, but did you have any success on this in the end? I've got a very similar requirement to revive a dead phpBB for a particular community in the wake of recent events.
Current plan is to scrape with 'waybackpack' (a python script I found on github) and then use either an incredibly complicated regex or a few thousand LLM calls to convert the raw html back into insertable data. But if anyone's managed this before and is willing to share their experience and/or scripts then it'd be super helpful.
Current plan is to scrape with 'waybackpack' (a python script I found on github) and then use either an incredibly complicated regex or a few thousand LLM calls to convert the raw html back into insertable data. But if anyone's managed this before and is willing to share their experience and/or scripts then it'd be super helpful.
Statistics: Posted by cadwalen — Wed Jan 22, 2025 7:54 am