Friday, September 11, 2009

The Solution for: Too many spam comments in wordress, even more than Akismet can process

Today I helped a client get rid of 400,000 spam comments.

This a ridiculous amount, but I can share how we cleaned this up quickly and effectively.

First, I think you should know, if you follow this post as a solution, you should know the worst that can happen is you completely screw up your site forever. You're responsible for your own site's fate, not my blog post, so make sure you know how what you are doing is effecting your site's database.

Secondly, I think it is important to note that this was a special situation where Akismet could not process all of the comments, it would just stall out.

It is also important to note, that I don't have enough knowledge of the limitations of Akismet to understand why it could not process 200-400k comments.

Also, before moving on to the solution, do not continue until making a full backup of your database. If you have this many spam comments, then please be patient with your database backup it will take a very, very long time to dump the sql.

So to the solution:

We looked for common strings in the comment content, here are a few:

buy
casino
win

Pretty much you need to find words in the content that repeat over and over.

Then we ran this sql statement on every common spam string we found:

delete from `wp_comments` where `comment_content` regexp `SPAMSTRING`;

For example you might use the string buy to single out a bunch of them, but be careful if your readers are using the word buy a lot, don't do this! use something unique to the spam. If your readers never use the word buy, or maybe on did, then let that 1 comment go for the greater good and do this:

delete from `wp_comments` where `comment_content` regexp `buy`;

On my client's site, this got rid of 40k comments.

After doing this a few times, we had the comments down to 400 and could finally run Akismet.

Let me know if this post is helpful with your comments.

Monday, January 19, 2009

Sunday, April 06, 2008

poking around / hacking apple's rented m4v files

My curiosity today is to see if I can through some testing, determine if there is anything Apple does to an m4v that signals iTunes that it is "time to delete" the file, or that the rental period is over.

I really know nothing about the m4v format, im just curious, eventually I'll run out of steam or figure it out.

I currently don't have a good method for running ktrace on /Applications/iTunes.app/Contents/MacOS/iTunes, the amount of data produced is incredibly difficult to sift through. We'll see though.

So right now I am using strings and grep.

After purchasing The Darjeeling Limited(great flippin movie btw), I ran the following:

$ strings Public/iTunes/iTunes\ Music/Movies/The\ Darjeeling\ Limited.m4v | grep 2008

output:

2008-02-26T08:00:00Z
2008-04-06 05:35:03
2008-04-06 06:01:55


notes:

I downloaded at 1:08AM CST 04/06/2008

------

After the playing the movie(started approx 01:15 CST 04/06/2008) with multiple pauses but no closing iTunes or switching media files, the new output is:

2008-02-26T08:00:00Z
2008-04-06 05:35:03
2008-04-06 06:01:55
2008-04-06 06:15:56


Why the change? And what do those times mean? Where is it


I'm going to close itunes , reopen, play again. see what happens.

output:

No changes.

---------

strings Public/iTunes/iTunes\ Music/Movies/The\ Darjeeling\ Limited.m4v | grep ""

output:

\asset-info
file-size
screen-format
cast
adamId
name
adamId
name
adamId
name
adamId
name
adamId
name
codirectors
copy-warning
directors
adamId
name
producers
adamId
name
adamId
name
adamId
name
adamId
name
screenwriters
studio


Ah! embedded xml! probably a feature of m4v, so now I try:

strings Public/iTunes/iTunes\ Music/Movies/The\ Darjeeling\ Limited.m4v | grep -A100 -B2 "asset-info"

output:



asset-info

file-size
1082229365
screen-format
widescreen

cast


adamId
44144761
name
Owen Wilson


adamId
189066686
name
Adrien Brody


adamId
1858498
name
Jason Schwartzman


adamId
275559340
name
Amara Karan


adamId
214398928
name
Wallace Wolodarsky


codirectors

copy-warning
FBI ANTI-PIRACY WARNING: UNAUTHORIZED COPYING IS PUNISHABLE UNDER FEDERAL LAW.
directors

....
BLAH BLAH BLAH BLAH BLAH
...


Fox Searchlight



covr
data
JFIF

AppleMark
$3br
%&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
4ICC_PROFILE
$appl
scnrRGB XYZ
acspAPPL
appl
-appl
rXYZ
gXYZ
^C


------------

What is this AppleMark stuff I wonder...

Ah, so grepping shows 24 instances of AppleMark,

There are 24 chapters included in the video. This is still basic m4v stuff.

-------------

Ah, so why would the file be working by itself? Why not use apple store authentication or a central file on the system that works with itunes to make sure a file hack doesnt work or a time hack doesnt work or whatever?

hmm well...

My original thoughts were if the file had date information that maybe got updated as it was played with a system id attached to it, then maybe after the first play, the file is marked, and yada yada

BUT....
couldn't you just then copy the file somewhere else before using it, watch it, then copy the prewatched copy onto the system and it would still work(for atleast another 24 hours) - I don't think apple would do that, but I'm going to try it anyway.

So when the file is first downloaded, atleast 2 new files are created:

/Users///iTunes\ Music/Downloads/TheNameofYourMovie.tmp/Info.plist

and

/Users///iTunes\ Music/Downloads/TheNameofYourMovie.tmp/download.m4v

The Info.plist does not have to exist. It relies on download.m4v for it's existence. You can delete the info.plist and it will come right back. If you chmod 444 Info.plist, then you get a download error. The test here is, is it possible to modify the Info.plist to get whatever flags that set the movie as "rented" to not be set in the first place. I'm not sure how easy it would be to change this later.

The file seems to already know it is a rented movie, so we will need to now figure out how to trick the system, locally, into giving us more time.

discursive tangent:
--Also, eventually I can string and grep through a ktrace.out for the inbetween stuff that happens between iTunes Music --Store servers, iTunes and those 2 files.
--Why though? I forgot where I was going with that.

Need to also install ethereal, find out what happens when you play the movie for the first time.
One thing is clear, Apple store is contacted. If the packets are simple enough to figure out, and there isn't too much crazy business, an application could be made to circumvent the communication with fake communication like so:

an application is routing all traffic from itunes, either to the internet or to the fake apple servers

1. You click to play a Movie, or Are doing anything in iTunes
2. iTunes tries to contact Apple Store
a. if packets match rented movie checking in with server: route them to the dummy server application locally
b. else route packets as they would be normally routed

This application would only run when iTunes is open, in order to avoid performance issues on the norm.


There is a problem with this approach I didn't see until just now:
The timer would still be started by this process.

So the big question is:

What is timing the file?
Where is the data located, and how is it updated?

Are there one or multiple methods employed?

Okay, So here is I am going to try with a ktrace:

1. ktrace -t c(or i) iTunes.app/...../iTunes
2. kdump | grep for the following list:
create, mknod, link, symlink, mkdir
rename, remove, rmdir
access, getattr, setattr

just to see what it looks like

Tuesday, January 01, 2008

Osama Bin Laden is Dead according to Benazir Bhutto

whats up with these conspiracy video flavas....

... a major WTF:

http://www.youtube.com/watch?v=_sxxv_R4uJ0

Monday, November 26, 2007

Trip the ER

everybody gotta go to the er sometime.

tonight I went because of a severe pain in the right side of my stomach area.. round where the liver at I reckon.

took too long to see a doctor, and I started feeling better, so I left. They said prolly nothing but see a doctor real soon, so tomorrow I'm calling the docs up. Some weird pain seriously. Don't feel right if I lay on my stomach.

kinda in the area where the penddix is. so we'll see. for now it doesn't hurt real bad like before, so I can tell i'm not dying or anything. Crazy if imna be one them get the pendixx takin out types though. maybe its a pulled muscle. who fuckin knows.

Tuesday, September 18, 2007

Andrew Meyer

NPR, you know the story. Report it. It should be on your front page.

This is ridiculous. Report the story. More and more cities in the United States are becoming police states. And for what? I don't think it is secret societies, but I do think it stems from a total lack of mindfulness of others. In my home town of Fort Worth, I cannot ride a bike without expecting that maybe the police will give me trouble. Yeah, a bike. There it is such a far off idea that Police actually suspect you of mischief.

I am a white male by the way. Since when do you hear of a white male worrying about the cops? No one should have to worry about the police bothering them unnecessarily. What gauges this? Wake up, its common sense. Watch the video again, was that response healthy? I am a PERSON, who works as a network technician for a respected company. I have hobbies, I have things I love and I feel bullied by police for thinking freely.

I don't think police are bad, but I do feel an unnecessary mentality in the United States is brewing, and authentic inquiry is being repressed by it. Are you going to continue to repress your own authentic inquiry npr?

Cover the story on the front page.

This isn't just happening in the southern states. We all know people in different areas of the United States that suffer from ridiculous injustice of police bullying. It comes in different forms.

You, the person reading this, do you feel free to walk down the street as you want to be? Ask yourself that again next time you feel like you have to make a really trivial adjustment to your walk or attire. Or inquire next time you are outside trying to be a person with other people.

Be well,
Bob Sawey
512.524.7652

Monday, August 13, 2007

Hollywood Video / Netflix Merger

So I am an employee at Hollywood video. And since I've never signed a NDA, and I've never been told not to discuss this, I'll give you some info I've been over hearing.

It seems that Hollywood Video is maybe going to buy Netflix.

The district manager of my store told me this during my interview, and then I overheard a coworker telling this to a customer the other day also.

Long live Netflix. I think its a cool freakin business, I dont feel like they need to merge with anyone. But whatever.

I can't find anything about this on the net, so if anyone else has information about it, please let me know.