Semantic SEO Video with Bill Slawski and Dixon Jones

Terry Van Horne and Bill Slawski host the “Entity Hour” a show about the semantic web and SEO. Today they interview Dixon Jones of InLinks on this semantic SEO toolset and a recent poll he did on “Does Schema markup affect SEO.

Terry: Basically, the new SEO. I call it semantic SEO. I don’t know if that’s the best name, but I think it’s getting close to that. You’ll see a lot of the old people that I’ve had on the show for years.

I don’t meet a lot of new people, just keep going to the same well and dipping when I can. Luckily, I’ve been around a long time and met a few people over the years. I don’t know, what do you think Bill? Anything you wanna add?

Bill: Sure, I think semantic SEO isn’t a bad term. It captures the concept, the idea that when you search, you’re searching for actual things, real-world objects as Google calls them. You’re no longer just searching for strings of letters.

You’re not matching documents with terms that are in a query. You’re trying to search for information about certain specific things, concepts, ideas, topics. And if the search engine can give you information about those things and help you learn more about them, it’s still doing its job. So as SEOs, our purpose is to help it respond to people in a positive way as possible.

And Google used to crawl the web to create information about links and pages that are connected each to each other through links, and now it’s doing more data mining than web crawling, or a combination of the two.

It’s evolving in that direction; it’s trying to find information that it can connect together and knowledge about those connections. So it’s crawling entities as opposed to crawling web pages, and instead of links, it’s looking for relationships between the entities.

Terry: So you could almost call it now entity search rather than web search.

Bill: And so, you know, we’re talking about this for the show, and you asked me about if there are any natural language processing patents that Google has come out with, then there’s one that focuses upon how Google might extract entity information from web pages.

It’ll go through a web page, will look for nouns that are capitalized because they’re often like people’s names or place names or business names, and Google might try to see if there’s information about those particular entities, like any classifications or any attributes, specific properties of the entities.

So if Google is crawling a news website and it gets a sports page and it sees a story about Bryce Harper, who’s a baseball player in the Philadelphia Phillies, it’ll say, “Okay, Bryce Harper, we see he’s an outfielder, he hits lots of home runs, he plays for the Phillies in the play with the Nationals,” and it’s connecting all this information.

It’s indexing facts about Bryce Harper, the player. And this is the new Google, and it’s creating what it refers to as Association scores about those relationships.

Bryce Harper plays for the Phillies, 80% chance of him playing for the Phillies based upon now reliable websites like ESPN, or that we’re finding information about the match and how popular sites are, and how close the facts are to the entity, Philadelphia Phillies, and Bryce Harper.

Nowadays, often appear together in more recent articles and in older articles, you see Bryce Harper and Washington Nationals closer together. So I can tell he played for the Nationals before he plays for Phillies now, and it’s building these association scores.

So Google doesn’t have a ranking metric that involves accuracy, but they are building into this entity recognition and entity extraction process confidence scores in how likely it is, some classification, or some properties or attributes of entities are true, right?

Terry: So it is a type of fact-checking, right?

Bill: Like, you know, Barack Obama is married to Michelle Obama because they often appear in the same pages. And even though Barack Obama appears on many pages with Hillary Clinton, you know that he’s not married to her because the times they say Barack Obama and his wife Michelle, right, right.

So, it’s drawing those connections, and Google is learning from those new sources. It’s, to some degree, Google’s moving away from relying specifically upon places like Wikipedia, which are human-edited knowledge bases, and they have limitations in that they’re edited by humans, they have biases, they don’t update as quickly as the daily news. It comes out every day and gives us more information about topics.

Terry: Yeah, I believe that’s one of the things that’s holding some SEOs back who’ve been around a while is they see all of the answers and stuff in the SERPs as stealing their traffic. Well, ’cause I never owned it to begin with.

Bill: Yes, that’s true. Google’s trying to provide information that scales, and a website’s basis, as opposed to remember the Google directory. It was a copy of DMOZ, and at some point, Google said, “We’ve had enough for this, we give up. If you want to find the information by topic, just search for it, don’t look it up in our directory anymore. We’ve stopped publishing it, it doesn’t scale on a website’s basis.”

Terry: Yeah, it also meets their goal, which is at some point, they don’t want people to ever leave Google.

Bill: They want people to continue searching and to see relevant advertisements because sometimes the role of an advertisement is what people are looking for.

Terry: Right!

Bill: And what I think what webmasters have to consider, have to consider is that those kinds of answers that Google is giving, a lot of times, save people time because they’re not going to a website to get the one piece of information that they were looking for that’s in a whole page.

Bill: Like, if your business model is based upon how tall Abraham Lincoln is, and Google’s telling people for free, “Abraham Lincoln is six foot four,” well, it’s time for a new business model.

Terry: Yes, definitely, it’s “What is my IP address?” You ask Google, “What is my IP address?” and it’ll tell you. There are websites dedicated to telling you what your IP address is.

Terry: Oh, I use it for money calculation, like I don’t look for a currency converter anymore. I just go to Google, and it’ll do it right there for me.

Bill: Right, so these are served up as new business models, right? Google’s taken over that because you’re limited by your creativity as to what you can put in a webpage. And if your creativity says, “Let’s make it the stupidest thing that we can come up with to try to make money,” you know, you’re limiting yourself.

Terry: Well, I think people also, if you really look at the numbers, those kinds of queries that were bringing in those kinds of visitors were actually a cost, not a benefit. Because you have to give them bandwidth in order to serve the pages, and they weren’t buying, nothing, and didn’t intend to buy anything.

Bill: So, one of the other natural language processing patents I pointed you to was one where I talked about how Google might answer queries by building knowledge bases per each query. So if you search, for example, if you search for “best science fiction novels of 2020. Certainly!

Of course, here’s the text with shorter paragraphs for better readability: Google will return a carousel at the top of the search results, showing you a list of books. It’s actually returning the results for your query, looking through them, extracting entities about those, and book entity information.

And making a list, making the carousel on the fly each time you set, you know, best crime novels of 2020, best science fiction novels of 2020, best comedy novels. It’s building those carousels on the fly. It’s building knowledge graphs from the query that you performed, and then taking, extracting entity information and putting that in the carousel. So, you know, it can be more than just “best blahblah of 2020,” ranking best houseplants, air quality, events.

Terry: They use carousels quite a bit for events and shopping

Bill:and showing who won the Pulitzer Prize in 2020, so it can be Awards. It can be songs, it can be movies, or anything where people would compile a list, basically.

Not every query that you perform that involves entities might give you that type of carousel, but a lot of them do. It’s kind of interesting. If you want a history of TV shows over the last twenty years, you go best TV shows 1980, best TV shows 1981, best TV shows 1982, and you can see old ones get taken away. They give some idea of the history of TV that you might have grown up with.

Terry:They would be where they found new entities and added them to the graph, replacing something older.

Bill: So Google is returning pages that are “best comedies of 1980” or so on, and it’s got lots of results for that. It’s building a knowledge graph out of those pages, where it’s finding the mood, ranking the entities, and extracting to find the most important, the highest-ranking ones, including them in the carousel first. Soo that list they care so you see is a ranked list.

Terry:Yes, I know Google likes those because whenever I do a search or use streaming software, right when I was gonna set this up, well, all I would see of the top 10 results, a lot of them were reviews of the best software for doing that.

So Google doesn’t always return carousels like that for every query that you might have. For instance, it depends. I’m taking it from what Bill has to say, that they might sort these types of things based upon attributes for the queries. So if you do something like “best other actors who acted with Tom Hanks,” it doesn’t give you a carousel. But if you search for Tom Hanks, you’ll see a knowledge graph with the people also searched for in the bottom knowledge graph. Those tend to be people who acted with Tom Hanks. So they do like they have that information, just a question of when they display it.

Terry: Yeah, Bill gave me five patents. He’ll be in the show notes. I don’t know if we’re gonna get all discussing all of them because at 1:15, we’ll be inviting Dixon to talk some more about the Semantic Web, his tool, and a report that he gave on his site about schema and how it can possibly affect rankings. So I’m looking forward to that.

How about I just let Dixon into the room, and he can join us.

Dixon:Hi, guys! No, I just… I just met you could completely close down the window and everything, so your last two, three minutes I completely missed. So I typed… you decided that ideal language. I was also… I was also trying to record the screen in case you were recording it, and there… and I thought, “Oh, no, we’ll all forget that then.” So I hope you’re recording it. Sorry anyway, oh yeah, being it’s recorded on StreamYard then I’ll be uploading it to the SEO Pros UK channel, which I spent a couple of days putting together, and yeah, definitely be there to share. Good.

Dixon: And hi, guys.

Terry: So to get started, the Semantic Web has been on SEOs’ minds for a while. I remember when they first started talking about it, I just yawned and moved on. It was a lot like mobile, in that people were talking about it like it was gonna happen tomorrow, and you know, it happened five, six years later.

Dixon: SEOs, we’re terrible. I mean, we’ve all been in this industry for a long, long time, but until we can see a way to measure something, we kind of assume it doesn’t actually exist.

I remember before Yahoo Site Explorer came out, you know, we didn’t have any idea about backlink information at all, really. And then, you know, Moz’s came out and Majestic came out, and later, Ahrefs came out. And all of a sudden, we now could see why the hell that page over there was important when we couldn’t see that before. We were just blind. So we just didn’t pay much attention to things.

And I think there’s a real problem with schema or was a real problem with Semantic Web. We see the front end of what people are doing and their users’ user queries. So we see that back end. But if we didn’t have any tools to really understand the middle of the system, and that happens all the time, really, you know, in the background, there, Google was… Bill Person Person, really.

Well, really, I see Google’s Knowledge Graph as a massive great big almanac of ideas and concepts. And so if you think of the web as a library of books and pages and ideas and concepts and records and videos and goodness knows what, and then you’ve got this almanac that the librarian is writing about where the hell everything is and what links to what. That’s what the almanac is.

It’s the Knowledge Graph, really. And I think that as that’s getting better, Google’s then exposing that more and more and more and more. And then, I’d necessarily say when they’re exposing it… and, you know, Bill has to, you know, pick it out, say, that carousel, that’s a… you know, that’s an interest, and that’s got to be using the Knowledge Graph for that or the Knowledge Panel or whatever. And it’s starting to expose itself.

But as SEOs, we’ve been bloody useless at thinking in terms of concepts and entities and wasn’t really tools out there to do it.

Bill:My work… I opted a maze and ants in 2005. So I absolutely failed at ranking a page for Black History Baltimore.

Dixon:I remember this story.

Bill: Yeah, it… we couldn’t get past ranking a hundred. There are too many other good pages, So we said, okay, there are lots of places people could actually visit in Baltimore. We were working on our Baltimore Visitors website. Let’s give people places to go and see that’s relevant.

Yeah, they’re famous historic churches. They’re famous colleges. There’s a nine-foot tall statue of Billie Holiday. There are art houses that Frederick Douglass bought in the ’60s that he built off as his. These are places people could visit. Let’s give them a walking tour of Baltimore.

Dixon: Google sees the connections. It says, “You’ve got everything I need to say you’re the authority.” Ah, yeah. We made the page about black history in Baltimore, and it was better than repeating the word “black history in Baltimore” thirty times.

Terry: so it worked in 2005?

Bill: Is that it worked in 2005? within six weeks? No, within two months, it was the sixth most visited page on the site of a 300-page site. Yeah, yeah. I think Google is always being kind of themed or entity, you know, looking for entities. Just it wasn’t called that then, they were strange that they were looking for. I hope they’re actually looking for that thing.

Bill: I think this changed between 2005 and now is that back in 2005, you might disagree. I think that Google was sitting there and associating, treating a page as a document object. And I think what’s happened since is that it’s now treating little snippets of text on the page as document objects. So you can, so you can. You have many entities on the page. If you run Google’s NLP algorithm, it shows you it’s got several entities that it identifies on the page. So now it’s picking out entities within content, not treating a page as a piece of text.

Bill: I think that depends on the search engineer because in 2006, Microsoft was doing object-oriented search, where they were indexing objects on pages, then calling them entities, then calling them concepts. They call them objects, but they were finding facts and attributes for those facts and indexing those things. So, you know, and that was Microsoft who, in 2006, wasn’t nearly as popular as Google. They removed chat segmentation after that as well. There’s one new thing they’re doing, block-level page rank, possibly.

Dixon: Yeah, yeah, yeah. And now even Majestic does block-level analysis. So it’s, you know, it’s all doable. Yeah. So I think you made an interesting point before I came on as well about the capitalization in the patents. We heard it’s at, as well, but what it does is it means that Google, the entities that come back from Google, are highly biased towards proper nouns. People, places. Google is very good at supporting people, brands, and places. They’re not good at spotting interest rates or, you know, or computer processes and sets yet.

Bill: Google is working on personal knowledge graphs. They’re working on making sure the entities they cover aren’t necessarily highly notable entities as in Wikipedia. So you have macaroni and cheese, which is an entity, but it’s not a highly notable entity, and Google might give macaroni and cheese a knowledge graph.

Yeah, our knowledge panel. But it’s, it’s not… Yeah, broccoli, same thing.

Terry: Lots of answers are questions and answers, that’s for sure.

Dixon: Yeah, but we’re definitely seeing… I mean, you can see it in their API, and that’s where we come out with the new… It’s a used search engine understanding score because we’re very aggressive at picking up entities. We can see that, you know, in most industry sectors, it changes from industry sector to industry sector from what we’re looking at. But we’re running industry reports, which you can go and find on the site. You don’t have to log in or anything to get to the industry reports.

And we’re seeing that really Google only finds or at least identifies themselves as finding about 20% of the ideas and concepts that we’re finding. So they’re only getting 20% of the understanding that they get. And this is still, you know, they’re still got further to go. And I think they’ve got to get there and they’re gonna go there in leaps and bounds. But we’re seeing that they’re getting these capitalization kind of ideas much more strongly. It’s not that they don’t get the concept of marketing. They do well, you know, they’ll get those concepts. But not unless the other right topics are on the page as well.

Bill: Before there were tools for entity optimization, I used to look up things on web pages in North Capadia to see if their pages were about them. So I had a client who had an apartment building with units from… And it was a four-page website, and they missed lots of entities. Like if you got the elevator of this apartment building, went to the basement, and got out, you walked into the Washington Metro line.

So you’re connected to all of Washington, DC, Metro Northern Virginia, and Southern Maryland. And you could go to 31 different Smithsonian museums from that Metro line. And these were the types of entities they really should’ve talked about on the web pages and show that you didn’t have to face Washington commuting traffic if you lived in this apartment complex, which was a big selling plus.

Terry: Which means now we have tools that do that for you. I know when I was doing SEO, by the way, people, I’m retired, I don’t have clients anymore. I do it again because I enjoy doing it, not for money. If I was doing it for money…

Dixon:I tried to, touring, they wouldn’t let me do it. My wife just said, “You’re getting under my feet.”

Terry: So I got bored, as it was, decided to start some old stuff back up. But I found it to be an actual nuisance to try and optimize a page. In my case, I would go in, download from Google the top ten results, go to those pages, copy them, bring them into a text editor, break it down to the words on the page, and then pick out the entities myself, you know. So what takes about ten minutes now took me literally hours.

So, last night using your tool, I was able to optimize a page in about two and a half hours. Previously, looking at a good day. And I wrote the page as I was doing it.

I think the biggest shift for me has been the on-page optimization moving from phrase-based matching for scoring pages to entity extraction and, you know, that sort of thing. I think there’s still a lot of people that are struggling with that, and I think a tool like this is what will help them get it, ’cause it’ll be a lot easier to do.

Dixon: It’s really hard to get your head around thinking in terms of things and entities, because you can’t always tell what’s an entity without having something. And as you say, Bill, Wikipedia is just brilliant because Wikipedia will give you, you know, entity after entity.

And while, of course, Google is building its own knowledge graph and it’s gonna probably do what it did with Google Directory and, you know, eventually drop its data source and run its own two feet. We do know that they used Google’s Wikipedia data as a training set to build their Freebase to be, they’re building out the knowledge graph.

Bill: We see Google now analyzing audio in videos to better understand what’s going on in those videos, because the text associated with videos doesn’t always describe everything happening in those videos.

Dixon: And it read a million books as well, didn’t it?

Bill: So, and if you’re searching for a quote from somebody, you probably want to watch it because you probably know who said it already, but you want to share that person saying that thing. Yeah, so Google is no longer relying upon knowledge bases for information about the sources of quotes, because a lot of times knowledge bases don’t have those quotes in them.

Terry: I think they’ve been doing a lot of work with voice search, too. Like when they had that telephone… The freak telephones, they got a lot of data to work with, and I think they’re getting pretty good at that. And with YouTube, I think for the captioning, that kind of understanding speech would be huge, because most captions are, you know, not easy as things to edit.

Bill: And there are an incredible amount of images, pictures on the web. Google’s analyzing what’s in pictures, so they don’t have to always resort to Wikipedia. If you ask a question about where do bears hunt, the answer based upon pictures is in the middle of rivers for fish. And they can show you lots of pictures of bears with fish in their claws in their mouths, and that’s where they hunt…

Dixon: I think there’s an interesting challenge for SEOs. And that is part one of this whole move towards entity SEO or semantic SEO, whichever phrase we’re going to end up with. I don’t know which one’s going to stick yet, is feeding, feeding, feeding the beast, giving that information. How do you get your brand into and be associated? How do you get Adidas associated with footwear? How do you get Nike associated with telephones or, you know, whatever? Whatever those… You’ve got to get your story aligned with the story that’s your market is marketing to. And then that’s part one, and we’re sort of doing that.

But part two, of course, is that hard bit, Terry, where you said, you know, now they’re answering the question for you, and then they don’t need to send you traffic to your website. That’s the real challenge for SEOs of turning that knowledge graph back into money in your pocket. Not necessarily visitors to your website, but somebody buying your brand, money, and your business model having value in that world.

Because, as you say, though, you know, if Google can just answer the question and not give any attribution necessarily to it, then the SEOs and the business model hasn’t done its job. The business model needs to have it, you know. It needs to have a situation whereby, even if it’s the image, the image says a story that says you’ve got a gun, you know, fly easyJet to be able to get from here to here, because that’s the only people that do it.

Bill: I think who I was trying to adjust for that from what I’m seeing. One of the patents and talks about images being shown with featured snippets was updated recently. They did a continuation patent where they updated the claims. They said when we show an image with a featured snippet, the answer will try to use an image from the same source as the answer.

Dixon: Excellent! Yeah,

Bill: in the past, they didn’t necessarily do that, so with this new thing in the claim section, they’re making an effort to try to show images from the actual sources of answers.

Dixon: I’ve also seen that they are giving attribution where it’s possible to give an attribution. Of course, you can’t give an attribution for how tall Abraham Lincoln was because it’s a fact that anybody can have and share. But they still had to get it from a third party, though they could get it from many, many third parties, and therefore, in fact, they’ve probably got different heights and stuff averaged at some point.

Bill: The public domain,

Dixon: yeah, but you’ve got to. Your business model has to rise above that. It doesn’t, it is much, is my point. Your business model has to rise above that. And we’ve still got the challenge. SEO still have the challenge of saying, “I still need to get the customer to buy my product or my customer to engage in my world. So I have to have something that’s not given to Google, if you like.”

And so what we had back in my majestic days, we, I had an argument with the other guys in the team about why don’t we put, we’ve got trust flow and citation flow. We’ve got it. We’ve got a metric that could be for every web, for every brand on the internet. We could see, could give you a score. So why don’t we put that in, you know, in our own code? And I lost the argument, quite right, because we were just giving too much information to the Borg.

And what’s the point of what, giving it away when you’re trying to have a product that you’ve spent a lot of time and energy and effort trying to have? You’ve got to keep, you’ve got to keep it back. And we would have been a bad decision. So I was wrong, they were right. It was ever thus, still good till. Yeah, thank you. I’m probably gonna get better.

Bill: Sometimes when you come up with scores, it’s better often that state. You much about them. It’s like result scores in Google’s knowledge graph. The results were to you, well, what the school, when they give a confidence level, it’s they don’t necessarily call it a confidence level. They never referred to it as a confidence level.

Right, it’s a decision of how much a specific entity is being referred to in a query, searches for it. And it could be a fairly high number, it could be fairly low. They don’t explain how they calculate that.

Dixon: No, they don’t. It’s a bit messy, the whole data that comes back in that API is a bit messy. And we’ve tried to clean it up because we, the free tool on the in-links on it at all. All it does really is it compares our, the API, and cleaned up with our API of stuff and cleaned up. So we can see which topics they’ve picked up from, which ones we haven’t, and it’s yeah, it’s not the cleanest output. Is it? Bill: It’s often accurate, but it’s misleadingly accurate. Like I remember doing a search for Google in Google’s knowledge graph, and getting about 23-24 results. The first one was some company in Man View, California, and then like the seventh one was Topeka, Kansas, right, renamed itself Google Kansas one day to track it to try to get Google Fiber.

Dixon:Well, yeah, when I put, so I’ve been fighting over that bloody knowledge graph because I’ve been quite famously Jason Bourne Adorno this story. I am Dickson Jones trying to get Dickson Jones as an entity has been a problem. I’ve done it now. I’m not there now.

Bill: You mean the architect

Dixon: Exactly, exactly. So, so, so I’ve got there now. I’ve got a knowledge panel, you’ve got to dig to find it. But more importantly for me, if I query the knowledge graph, I can see I’m in there. And that’s, you know, and but also at the same time, other stuff is popped in. You know, a Katharine Parker from so and so, you know, there’s some really weird stuff that’s popped in there. And also, might my knowledge graph URL changed as well, which I think is a bit odd. I wouldn’t expecting that. So I don’t know why that’s happened

Bill: I’m really mystified by my knowledge panel because there’s a people also search for Gene Roddenberry, Steven jobs George Orwell.

He’s a smart, he’s a smart cookie, this Bill Slawski.Your knowledge panel is really rich. It’s very good and it’s interesting that you’re associated not with other SEO so much. I think I maybe Bill Hartz is in the list, but you know, there’s anybody called Bill, basically everybody in SEO who’s called Bill, I’ve realized. And they’re like, they’re like three or four SEO so, yeah, so yeah, B or C oh, she’s good at quite a very rich, yeah, yeah, yeah.

Terry: I had problem might a senator, his name was exactly the same as Mike, however, I did actually outrank them for years. And then all of a sudden, it became harder and harder and that I couldn’t take number one from.

Bill: I think finally, he was right, we wents to the same law school.

Terry: did he? Yeah, Even after he’s dead he still outranks me.

Bill: I didn’t see anyone anything like you cuz I think I would remembered seeing you around law school.

Terry: Where we go from here very hidden yeah that’s what I’m just looking up now. I make notes in well it’s only questions

Bill: Where we headed Dickson?

Dixon: Ah well, so I think that there’s been a political battle between the concepts of AI search engine results and the old-school PageRank kind of approach to search engine results, but for years within Google. But it’s interesting to see the change of leadership in Google, and I think it’s now, well, I’d say that we’ve now got a change of leadership and we’ve got an adze guy in there so I don’t quite know what that means, but um, but I think good to know. I think if you don’t optimize for entities, then you’re not going to have a future-proof, a future, a future-proof world on the Internet.

Interestingly, that probably is going to mean that your knowledge as it dissipates out into the ether of the internet doesn’t necessarily have to be on your website. Your LinkedIn profile is really important, your when you’re talking on it, you know if you’ve got an article in New York Times that’s really important, and everything that you do out and leave in the community, whether it’s you as an individual or a brand or a concept, the concept survives out in the world is probably a good way to think about it.

And those sort of how you know Hurricane Katrina is then portrayed out on the internet or the co-ed virus, how that manifests the bad or example, but I think it’ll be twisted, the story, the concept will be twisted by society, by all sorts of different people, by machines, some of which are in your control, some which are not, but it’s a story that just will carry on and will be will be changed by what it comes into contact with. I’m probably getting a little bit philosophical.

Bill: well why don’t you get a little bit philosophical concerning a slightly different topic… context. How’d your tool deals with context?

Dixon: So, ITO deals with context by analyzing what concepts are in the corpus of documents that we’re analyzing. So if we’re analyzing your own website, then we’re analyzing whether it’s 20 pages or 200 pages or 2,000 pages. After 2,000 pages, we’ve probably got a pretty good idea; adding another page is probably not going to make much difference to the knowledge graph really.

But as you add those contexts, cut that content, and you start picking out the entities, you’ve got 50 entities in a page say it could be anything but 50 entities in a page, but then you start seeing all of those entities again and again and again. Then the context then becomes the fact that these, you know, Barack Obama and Michelle Obama are mentioned together again and again, and we start to pick up patterns.

But then we have a shortcut really when we’re trying to optimize content because we can then take a corpus of documents that are already decided as best-of-breed. An obvious example is the top ten results from Google for a key phrase; they’re already defined as best of breed. But you don’t have to do that; you can put in, I’ve got ten defined competitors, these ten web pages or if you’re a government body, you can say write these ten districts will have content on this subject we want to be better than those, you know?

So you can take any set of documents that you think are already appropriate, and then we can build that context around the entities that are in those documents. So we’re using information about it against itself to try and find the context. Of course, our natural language processing also uses verbs and words and stuff, and that’s kind of part of our secret sauce as to how we’re trying to say that context within sentences and stuff.

Bill: so you’re finding frequently co-occurring terms that tend to appear on other pages or in tally for the same words or terms.

Terry: Dixon can I tell my experience? That was one of the best things they do Bill. It shows you that context in the other document so it tells you this term appeared in three documents etc correct?

Dixon: Sometimes we get it wrong because sometimes we don’t see a phrase, and we get the two individual entity things and define it as entity separately. It’s not perfect, but because of the way that we kind of display that, you can then as a human being use that to guide you.

Terry: I know I’d liked it because it gave me the advice. It would be like when you go through the process of collecting the proper keywords in the old days, this tool actually allows you to prioritize entities that you’re better at. I mean it distracted and gives you something to go on, so for instance, certain word is 15 times on average in these documents.

Dixon: Ounce you’ve chosen and once you’ve chosen an entity then you can build more context for the confu your editor by saying right give me questions that are associated with this this this content this this idea and that’s a pretty good tool isn’t it Terry?

Terry: I mean that’s what I found for writing your content yeah we one of the better tools I’ve seen for that.

Dixon: So we think we’re getting more questions and answer the public but I but I haven’t done a proper test but that’s my that’s my benchmark really I think we should.

Bill: So if you read a page about Lincoln, you make sure you include terms like Emancipation Proclamation on that page. People know you’re talking about the former president and not the car.

Dixon: Yes, “Lincoln” will have more than one entity associated with it for sure. And we usually get them right, but as Terry found out when I did a demo yesterday, we’re getting some wrong. For example, SEOs are talking about engines, and we have not yet tied “search engine” to a different concept, so that’s one that we’re getting wrong. The system will fix itself, but mostly we’re getting it right. Also, “search engineer” doesn’t drive the train. That’s right.

Terry: Yeah, I can add that Dixon’s tool allows you to tell them when they make those sorts of errors, so you can report it, and you’re actually helping yourself and them to improve the data.

Dixon: I think that’s a very good point. I mean, there’s a huge amount of human curation in Google’s stuff as well. In Google My Business panel, every time you try and do anything to change knowledge, a human being has to press a button. And it’s, as I understand, maybe after a while they trust you, but you know it’s certainly not instant and not for me anyway, you know. And Google My Business as well, there’s a lot of humans looking over these kinds of things.

Bill: So they’re also at the site in the machine learning we do see automation in their process, like Google Trends with them associating entity IDs with terms that are topics, not just search terms.

Dixon: Absolutely. Google Trends is really—you’ve got to be careful now not to compare a topic with a search term. Now, if you’re using the compare, you’ve got to make sure that you don’t mix those. You’ve got those ideas lost.

So I think that’s really interesting and we’ve got also we’re tracking that’s our market trend data is also using entities, trying to take entities and work out how much interest there is. So we use the word interest—well, it’s not click volume, it’s not search volume, you know, it’s also popular, a kind of measuring with weight. Yeah, and it’s not bad, it’s not bad, we’ve got some good stuff and some bad stuff, you know, but it would be nice if Google would give us YouTube’s log files and then we could just associate entities with videos and things and get some more data, you know, and to be a big if you happen to be a really big eBay or somebody with lots of traffic and lots of entities on your sites, let us know, really, we can probably use that information.

Bill: Finding other sources to gauge interest like in Google Discover, you know, like topics you clicked a little hard, yeah, yeah. But I don’t think that’s going to give enough information to start driving trend or opinion, really, and it will be very biased towards, you know, middle-aged white guy because that’s where the guys to do this kind of stuff, you know, and that’s why Wikipedia is so biased is because it’s a very narrow type of people that edit Wikipedia

So you know and it’s why Google’s trying to move away from relying solely upon Wikipedia why they’re trying to strangle entities from sources from authoritative webpages, they’re analyzing video in videos why they’re analyzing images learn about entities from those images that have nothing to do with Wikipedia like bears hunting in streams for fish.

Dixon: It’s their data sources of what they classify as an authority is really interesting. Jason Calico has spent a lot of time, Jason Barnard from Calico, spends a lot of time trying to record where he’s seeing different data sources in knowledge panels, which is kind of interesting.

Bill: It reminds me of the authoritativeness that Google goes through for determining authoritativeness for local business entities, including Google Maps. They display the business name and show you the website they associate with that, which is usually often the enterprise website for that business. They actually have a process to go through to make that determination.

Dixon: I get the feeling that that process is largely on a feedback loop. So once you’ve decided that a LinkedIn entity is an authority, then the links on there are fine as long as they can then get a link back, and they can see two-way communication about a message why both sides are agreeing.

Bill: Verification helps a lot.

Terry: So one of the reasons that I asked Dixon onto the show was on the dojo page in the Facebook group, he mentioned a report that inLinks did on how schema was helping SEO. There hasn’t been a lot said about that, and I know it draws something people don’t believe it and those who see.

Dixon: It was specifically, we were testing about schema. So, there are so many different types of schema: events schema, review schema, organization schema, person schema, and God knows what. But we were specifically looking at something that isn’t documented in the Google Documents anyway about schema.

They’re not showing this as an official schema, but they seem to be using it as a signal. But about schema, it’s basically you are telling Google or in the schema what this page is about and the main topics that it’s about, and it also mentions other topics as well.

So, it’s basically guiding a machine learning algorithm to shortcut, instead of hoping that the machine learning will implicitly understand the concepts on the page, you’re explicitly getting a hierarchy really right. So, what we did was we got 20 different SEOs to take 20 different web pages and put schema on there, whether it’s our schema or whether they cut and pasted the schema and put it up with the but, and then we started measuring the impact of that on the over the next month or so.

There was some criticism to the to the module, whether the mode that we did because we didn’t put in a proper control. I think it’s not I didn’t have a control, it’s just I didn’t report on a control and should have done, so bad Dixon.

Terry: You know what, that’s all the tests whether they say so or not to say that but that’s like we could in the old days, not now.

Dixon: it doesn’t work as well as I mean now, I’m pretty much saying the schema combined with the internal linking is but entity-based internal linking is definitely having, well, I say for me definitely having an effect. We did a there’s a there’s a case study out from yesterday that came out yesterday on our blog about a customer who’s there for a very big phrase and a competitive and also for what I will knock you off the pedestal phase good.

So, so we’ll see what happens with that now but he says obviously must have been a decent SEO before he started to be in the running anyway for the phrase but ultimately the schema plus internal links and it popped him on the number one spot, honey, let us report on it.

So, you’re more than welcome to go and look at that but I think that the schema on its own is a signal and we did establish that Google was reading it. So, I got confirmation from a Googler that they understood the schema. I didn’t get any feedback as to what they might or might not be doing with the sleeve, all they wanted to know that their system wasn’t broken reading it in the first place but the main thing is right when you’re telling it.

Terry: But the main thing is, right when you’re telling it these are the entities that I think this page is about, yeah, yeah, that I’m taking some of the confusion out of the process for Google and how much they trust you.

Dixon: Yeah, well, it’s not likely an old meta keywords kind of idea because the meta keywords either we used to spam with all sorts of stuff and the thing is if you do that with this all you’re gonna do is split the content context out into the completely wrong stuff. But it isn’t so dissimilar really to Google taking the title of a page and treating that as a good indicator of what the page is about.

Bill: When your schema and your content are inconsistent and you send mixed messages to Google, that inconsistency is bound to hurt consistency.

Dixon: I agree, absolutely, and that’s why it does become all about context. So if you’re using our tool, you can mess up your website if you’re going to make those associations incorrectly. You can make your rankings worse. There’s no point in seeing our recommendation that you’ve got to talk about engines, then you link to it and they’ve talked about combustion engine in there and you just gone off down the wrong route. You’ve got to use your, you know, you’ve got to use brain 2.0.

Terry: yeah that would be an in short supply.

Dixon: Well, I think we’ve made it so you can, you know, at least have a chance. So, because what we’ve tried to do is make it so that SEOs understand the principles behind it and then content writers don’t go don’t have a heart attack when they see, you know, a content briefing and an audit brief and stuff.good content really you just want to say please please talk about these concepts you know.

Terry: Right, that’s how I found it really helpful. I’ve never found it to add to or helpful in my writing a page. I actually found your tool helpful because it told me it suggested topics that I wouldn’t have thought of myself.

Dixon: I’d like to point out Gary Google Gary that he’s not being paid for this this no Donna he’s not pay me enough I need to do that although I will give you a link to my affiliate program if you want it you’ve been very kind so far about it as I’m sure it’s only because I’m here.

Terry: Oh I see when I decided to were talking about.

Dixon: It’s definitely got a different approach, but it’s also trying to not scare the hell out of everybody. I think some of the tools got very scary for normal human beings, and they weren’t necessarily guiding us down good paths. I’m not suggesting that you can’t misuse inlinks.

But I am suggesting that when you put a link on a webpage, if you’ve got the association right, it should be helping the user first, and the user should then go to the page that is going to explain the thing you’ve just seen on the page it’s supposed to help.

And if it doesn’t, then if you’re going to hurt the user, then you’ve also probably hurt your search rankings as well because the two should be fairly aligned when they become ask, you know, they should be like when I was talking about that problem with the context.

Terry: Yeah, I was thinking that it was a place where I wouldn’t want the reader to go off and off the page at that point.

Dixon: So, yeah, okay, okay, in that case, you want to break the link if you don’t want them to leave the page, right? That’s right, break the link, or you know, I can’t help with that. But if they, all right. And hey, I’m not complaining, I’m just saying that, yeah, yeah, you know, you want it to work on top that would be something out deep, yeah?

Well, I mean, so just a click away, so you can go through your list of the links, of the item system. It would do better as you got it in the list, you know, the fact that we’re using JavaScript to do half this is also quite interesting as well because if you don’t look through your list, you will have the link live on your site without having to go back. So they, yeah, so all of this, okay, the JavaScript injector links injects a schema and Google’s just fine with that.

Terry: I found it worked a lot better, Dixon. Well, I actually paired down from the number of pages I was giving it to work with, so I killed down from dirty that saves me from over almost forty down the other free twenty. Yeah, but not is not to save money but more that those are my money pages.

Dixon: Associations are important. You’ve got to get the associations of your key landing pages first, and then you just all of the rest of your content you just add in and then all of the rest of that content it goes through and sees where you’ve talked about these concepts and now it’s only going to link to the ones you’ve associated so all of the all the information is going to move to align.

Terry: I liked that it didn’t repeat where I had already linked to an entity. It didn’t drive, “Wow, that was good that I shouldn’t do.” So let me see if there’s anything else we can… Wow, pretty, we’re pretty well an hour already. We haven’t really decided how long we’re gonna let these go. I know some of the search geeks used to go for an hour and a half and then we talked an hour after that, I think.

Dixon: I think the younger generation, though, they haven’t got the attention span. They’re gonna get shot now, and I’m gonna get shot. I shouldn’t have done that. I apologize to anybody below the age of 50. Yeah, that pretty well covers what I had in that thing unless you want to go cover that individual tools, but I think that’s pretty…

No, people want to try and links down there. It’s free for the 20 pages. There’s all sorts of buttons to get a one-to-one demo with me. Obviously, I’m going to try and sell you the version that… But we want it to be a major tool. We want, I think, it will be the idea to be, you know, built-in.

It’s not going to necessarily replace the other tools that are out there, but if people don’t think about the process of doing entity SEO or semantic SEO or whatever we’re going to call it, then they will every time they wake up to a new algorithm find themselves slipping down the things even more.

But the other thing about getting involved with the entity side of things is that as Google comes out with new products like voice search or image search or whatever, you start popping up in those kind of places as well. Google Discover is a really good place for entity-based traffic to start popping up. So on your if you’re an Android person, you’ve got… It says, “We think you should look at this, this, and this.” That’s using the Knowledge Panel.

Terry: It’s not just Google, Alexa, all those. I do get what the devices are called. Veterans are going into homes now. They’re actually building those into the infrastructure of new homes, so you know that’s gonna be, you know, it’s not gonna be something extra in the home.

Dixon: It’s gonna be two, three, four, five, whenever Alexa, Echo, one Christmas I just apparently own one, one it just arrived in the post. Apparently, it was a big electrical energy company, and why I’m the one the prize and it arrived and it was all packaged and stuff with a supposedly from EE and I thought I didn’t enter a competition so.

So we never opened the other packet, we, I thought I don’t know who this is but I’m not having this, and we’ve never had an echo or you know in the house. I mean I’m sure we have, we got it all on the bloody computers and stuff but there’s, we am going let’s say we, I’m going on okay Google box or whatever you know and I think my family are kind of a bit paranoid about so I’m gonna play with that but doesn’t mean to say that you don’t need to think about entities because Google only you know Alexa comes back with one answer usually, you know it better be yours.

Bill: I have a Google speaker and and I’ll talk to it and sometimes my phone will try to answer and speaker will say and other devices answering you sometimes they work together well like the speaker will say would like me to send least those stories to your phone so creepy for me I’m not ready I’m not ready for that you know.

When you have a Siri and in Google now in an Alexa getting two arguments become amusing. What they should do, they should, they should do it there all three of them they should have a system that says okay Alexa talk to okay good okay let’s to talk to Google and an accident say something to Google and then they can start playing with each other’s algorithms and stuff you can just listen to them arguing.

Dixon: You know and I knew it’s gonna happen at some point I was gonna say you know what I’m too old I’ve got to go and hide in my little cottage in Wales you know and and you know I’ll get in links done get out the other end and I’m gonna hide I’m just going to switch off technology you know the problem is even growing plants in the future are going to be you know requiring AI.

Bill: So the Internet of Things will make sure that you have a computer in your refrigerator and usually open and you start planning the menu for the evening and your refrigerator will tell you you’re almost out of milk, your oven will say you’re not cooking at high enough temperature, and use a recipe you should consider instead of burning everything this time.

Dixon: Yeah, it’s good. I’ve just got involved with the RSA (Royal Society of Arts) in the UK, and they’ve got this kind of set of mission statements of trying to find ways in which you can take innovation and turn it into things for good, really, you know? And I think this is one of the big critical questions of the day: how do we encourage humanity to use this as a force for good as opposed to a force for control?

And that’s the real path. Whether it’s businesses doing the controlling or governments trying to control, or a black hat hacker trying to do the control, or a marketing person, or a psychologist, or just a bully at work or at school, there’s so much opportunity for it to go different ways.

And, of course, it will go different ways anyway, but to try and build systems that encourage us all to be better human beings, I think, is a philosophical question. And that’s gonna take some kind of government oversight, or at least some kind of community oversight, to stop us from going down some pretty ugly-looking paths. I’ll stop there ’cause I’m kind of getting all privacy law. I’m a big GDPR fan, a big GDPR fan, and so I build that. We’re off the topic now. Sorry if I’m going off; you can tell me to shut up.

Terry: That’s all right. Yeah, I guess we can end the show here. Pretty good. Now it’s running out of stuff to talk about, so well, unless you want to talk about Majestic, but I guess that’s kind.

Dixon: It’s good for finding a corpus of people that are authoritative on a particular subject, so it’s kind of a good influencer finder tool.

Terry: Oh, I thought it might be like the tool on SEMrush that you can give it a keyword and a phone number or something like that, and it tells you whenever that was mentioned.

Dixon: No, no, no, it’s really not that. It’s self-care. It’s really coming back with interesting finds of people that are influential. So you do start with a keyword, but that’s trying to find people that are relevant with it. Because it’s got the flow metrics, it’s got different things, so it’s got contextual and topical relevance and stuff. So you can say just because you’ve just been talking about Barack Obama doesn’t make you an influencer of Barack Obama really necessary or influential around the subject of Barack Obama. So it’s picking out some interesting lists different to other ways of doing the list I think.

Terry: So kind of a tool for finding influencers, marketing techniques.

Dixon: It’s amazing what you can tell once you’ve crawled the whole web. It’s amazing what you can do with that information. We don’t actually have to have all the on-page content to find some pretty interesting correlations and stuff, so yeah, it’s pretty cool.

Terry: Have they refined the category tool?

Dixon: No, I’ve gotta tell you, I only am involved as an ambassador of Majestic these days. I’m not involved in the inside story of it, okay? My heart, mind, and stuff is mostly on the inLinks. But I do do a monthly webinar with Majestic behind it, and I would also, hopefully, when we were out of probably locked down and going to conferences, I will go and talk with Majestic at one as well again because I love what Majestic has done.

I think they did an incredible job of changing the industry. But as to the nuances of what they’ve got then coming down the road, I can’t give you the same level of authority that I could before.

Terry: That was authority for me. That was one of my favorite tools for identifying, basically, the topics of the pages that I was getting links from. I think that became a lot more important than page rank and all that other malarky.

Dixon: The topical relevancy was kind of good, and they have got a lot better with context because they’re now doing chunk, as you talked about before, that analyzing pages in chunks. So every single page is put into a sort of, I think, 20 different chunks.

Basically, every five percent of the page is measured independently, and the links are then sort of put in different areas. You’re reading the context around those links as well, so they’re getting the contextual relevancy of that stuff.

Now, if you want to go and find the links that are coming into a page, anywhere, then now you can sit there and say, “Actually, I only want ones that are in the body text. I don’t want them if they’re sort of right at the top in the navigation or structure of the stories. The following, I also don’t want them if they’re surrounded by images because they’re a bunch of ads, you know?”.

So you can start really playing around with that. Then you can also have on top of that, “I only want the pages that are topically in this category or that area as well.” So those filters now are bloody slick, actually.

They’re very, very slick. They got a lot better around about October, November, I think, and they’ve been pretty exciting. We did a… So Majestic hadn’t done a big launch for a while, but in Brighton SEO, which is the largest conference in the UK, you know, we booked out.

They’ve got a sort of a London Eye kind of thing that’s there’s a goes up on the seafront, and you can book it for an hour and give everyone wine and force them to watch your videos and things. So they went in and splashed out on that, which was cool.

Resource links:

In March of 2021 Bill wrote What is Semantic SEO? which is one of the best posts I’ve read on implementing  Semantic SEO!

Dixon’s Semantic SEO Toolset:
Inlinks Schema Visibility report

Patent Discussion:

Bill dropped these with the comment, “They are three patents that cover a lot of ground and show how semantic information Is being pulled from the web to answer queries.”

Entity Extractions for Knowledge Graphs at Google

Answering Questions Using Knowledge Graphs

Ranked Entities in Search Results at Google